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Summary 


The  concept  of  identifying  codes  was  originally  developed  as  a  means  of  pin¬ 
pointing  a  specific  critical  node  in  a  network,  given  its  relationship  with  a  special 
set  of  codeword  nodes  in  a  graph.  Applications  such  as  a  fault  diagnosis  and 
sensor  networks  have  found  identifying  codes  extremely  useful.  For  example,  a 
network  of  smoke  detectors  with  an  accurate  identifying  code  allows  us  to  deter¬ 
mine  the  exact  location  of  a  fire  given  only  the  set  of  detectors  that  have  been 
triggered.  Unfortunately,  the  problem  of  finding  identifying  codes  is  extremely 
computationally  expensive,  and  so  the  real-world  use  so  far  has  been  minimal. 
To  deal  with  this  problem,  we  propose  the  use  of  a  special  network  structure  - 
de  Bruijn  networks. 

When  deploying  a  wireless  network,  some  highly  desirable  properties  are  (a) 
many  short  paths  between  any  two  nodes,  and  (b)  relatively  few  edges.  One  type 
of  network  structure  that  satisfies  both  of  these  properties  simultaneously  is  the 
class  of  de  Bruijn  networks.  De  Bruijn  networks  have  been  utilized  in  many 
applications,  such  as  fault  tolerant  networks,  peer-to-peer  networks,  amongst 
others.  Because  of  their  unique  properties,  many  algorithms  that  are  normally 
time-consuming  perform  exceptionally  well  on  de  Bruijn  networks.  This  class 
of  networks  has  yet  to  be  considered  from  an  identifying  code  perspective,  and 
a  complete  examination  of  the  problem  is  needed,  from  both  a  theoretical  and 
algorithmic  perspective,  and  our  initial  theoretical  results  have  shown  promise. 
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Introduction 


Consider  a  house  with  several  different  smoke  detectors.  In  many  cases,  a  fire 
will  trigger  not  just  one  smoke  detector,  but  several.  Based  on  a  specific  set  of 
detectors  going  off,  can  we  accurately  pinpoint  the  room  in  which  there  is  a  fire? 
This  is  an  example  of  an  identifying  code.  Each  smoke  detector  has  a  certain 
radius  that  it  covers,  and  some  areas  will  be  covered  by  more  than  one  smoke 
detector.  If  each  area  of  the  house  has  a  different  set  of  detectors  covering  it, 
then  the  set  of  smoke  detectors  that  go  off  completely  determines  where  the  fire 
is.  With  respect  to  sensor  networks,  if  we  can  find  a  minimal  identifying  code 
for  the  network,  then  we  can  easily  determine  the  location  of  the  critical  node. 

In  terms  of  graph  theory,  let  G  be  an  undirected  graph.  Let  Bt(v)  be  the 
ball  of  radius  t  around  vertex  v,  i.e.  the  set  of  all  vertices  that  are  at  distance  at 
most  t  from  v.  A  code  is  a  set  of  vertices  called  codewords.  Given  a  code  S,  the 
identifying  set  of  a  vertex  v  is  IDs(tj)  =  Bt(v)S.  The  code  S  is  an  identifying 
code  if  every  identifying  set  in  the  graph  is  unique,  or  for  vertices  u,  v  we  have 
u  =  v  if  and  only  if  IDs(w)  =  IDs(u)  [14]. 

While  identifying  codes  have  been  considered  for  several  specific  types  of 
graphs,  they  have  yet  to  be  examined  for  de  Bruijn  graphs.  De  Bruijn  graphs 
have  been  useful  in  many  applications.  A  de  Bruijn  graph  of  length  n  and 
alphabet  d  has  a  vertex  for  every  string  of  length  n  over  the  set  {0, 1, . . . ,  d—  1}. 
An  edge  is  drawn  starting  at  vertex  (w1;  112,  M3,  •  •  • ,  un)  and  ending  at  vertex 
(vi,  V2,  V3, . . . ,  vn)  whenever  U2  =  i>i,U3  =  V2 ,,un  =  vn-\.  In  other  words, 
(«2,  U3, . . . ,  un)  =  (vi,V2,  ■  ■  ■  ,vn-i).  We  will  refer  to  this  graph  as  B(d,n). 
Note  that  this  definition  produces  a  directed  graph  in  which  multiple  edges  and 
loops  are  allowed. 
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Definitions 


3.1  General  Graph  Theory 

Definition  3.1.  The  distance  from  vertex  v  to  vertex  u  in  a  graph  G  is  given 
by  d(u,v),  and  is  defined  as  the  length  of  the  shortest  path  from  u  to  v  in  G. 
If  G  is  a  digraph,  then  we  require  this  path  to  be  a  directed  path.  We  define 
d(u,  it)  =  0. 

Definition  3.2.  Two  vertices  u,  v  £  V(G)  are  adjacent  if  either  d(u,v)  =  1  or 
d(v,  u)  =  1.  We  denote  this  u  ~  v. 

Definition  3.3.  Let  v  £  V(G).  The  open  in-neighborhood  of  v  is  given 
by  N~(v)  =  {it  £  V{G)  \  d(u,v)  =  1},  and  the  closed  in-neighborhood 
is  given  by  iV_[ti]  =  N~(v)  U  { t; } .  The  open  out-neighborhood  is  given  by 
N+(v)  =  {u  £  V(G)  |  d(v,u)  =  1},  and  the  closed  out-neighborhood  of 
vertex  v  is  given  by  _/V+[v]  =  N+(v)  U  {i>}.  In  an  undirected  graph,  an  open 
neighborhood  of  v  is  N(v)  =  {u  £  V(G)  \  d(u,v)  =  1}  and  the  closed 
neighborhood  of  v  is  N[v]  =  N(v)  U  {'(;}. 

Definition  3.4.  The  in-ball  of  radius  t  centered  at  vertex  v  is  the  set: 
B^~  (v)  =  {u  £  V ( G )  |  d(v,  u )  <  t}.  and  the  out-ball  of  radius  t  centered  at  ver¬ 
tex  v  is  the  set:  Bf{v)  =  {u  £  V(G)  \  d(u,v)  <t}.  In  an  undirected  graph,  the 
ball  of  radius  t  centered  at  vertex  v  is  the  set  Bt(v)  =  {u  £  V (G)  |  d(v,  u)  <t}. 

Definition  3.5.  Two  vertices  u,v  £  V (G)  are  called  f-twins  if  B^  (u)  =  B^~  (v). 
If  the  graph  has  no  t- twins,  then  G  is  called  t-twin-free.  For  an  undirected 
graph,  we  use  the  same  definition  with  in-ball  replaced  with  ball. 

Definition  3.6.  Given  a  subset  S  C  V(G),  the  S  f-identifying  set  for  vertex 
v  is  given  by  ID,g(t;)  =  B^  (v)  D  S.  For  an  undirected  graph,  we  use  the  same 
definition  with  f-in-ball  replaced  with  t-ball. 


3.2  Types  of  Identifying  Sets 

Definition  3.7.  A  f-dominating  set  is  a  set  S  C  V(G)  such  that  for  all  v  £ 
V(G)  we  have  B f  (v)  D  S  0.  This  is  equivalent  to  saying  that  (J.ugS  B+  (v)  = 
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V(G).  For  an  undirected  graph,  we  replace  f?t~"  (v)  and  B+  (v)  with  Bt{v).  The  f- 
domination  number,  denoted  dorr/  (G),  is  the  minimum  size  of  a  f-dominating 
set  in  G. 

Definition  3.8.  [1]  A  fc-tuple  dominating  set  of  a  graph  is  a  subset  S  of 
vertices  such  that  every  vertex  is  1-dominated  by  at  least  k  vertices  in  S.  The  le¬ 
tup  le  domination  number,  denoted  7 Xfc(G)  is  the  minimum  size  of  a  fc-tuple 
dominating  set  in  G. 

Definition  3.9.  A  distinguishing  set  is  a  set  S  of  vertices  such  that  for  all 
pairs  of  vertices  u,v  £  V  ( G )  we  have  either: 

1.  u  £  S,  or 

2.  v  £  S,  or 

3.  N{u)ns  ^  N{v)ns. 

Definition  3.10.  A  locating-dominating  set  is  a  set  S  of  vertices  such  that 
for  all  pairs  of  vertices  u,v  £  V (G)  we  have  either: 

1.  u  £  S,  or 

2.  v  £  S,  or 

3.  B^~(u)  ns  ^  B^~  (v)  D  S. 

For  an  undirected  graph,  replace  f-in-ball  with  f-ball. 

Definition  3.11.  A  t-identifying  code  is  a  t-dominating  set  S  C  f /(G)  such 
that  for  all  pairs  u,v  £  V(G)  we  have  IDs  (it)  7^  IDs(u).  (Note  that  since  S  is  a 
t-dominating  set,  we  are  also  requiring  that  IDs  (a;)  7^  0  for  all  x  £  1 /(G).)  The 
variable  t  is  referred  to  as  the  radius  of  the  identifying  code.  We  denote  the 
size  of  a  minimum  identifying  code  by  7ID(G). 

Some  authors  will  allow  a  f-identifying  code  to  admit  at  most  one  non-empty 
identifying  set.  Unless  otherwise  specified,  will  require  every  identifying  set  to 
be  nonempty. 

Definition  3.12.  A  fc-robust  f-identifying  code  is  a  t-identifying  code  S  C 
V (G)  such  that  removal  of  any  set  T  C  S  with  \T\  <  k  preserves  the  f-identifying 
properties,  i.e.  S\T  is  a  t-identifying  code. 

Definition  3.13.  [8]  A  directed  resolving  set  is  a  set  S  so  that  for  each 
v  £  V(G)  there  exist  Ux,u2  £  S  so  that  d{y,u{)  7^  d(v,u2).  The  directed 
metric  dimension  is  the  minimum  size  of  a  directed  resolving  set. 

Definition  3.14.  [6]  A  determining  set  or  fixing  set  is  a  set  S  so  that  the 
only  automorphism  that  fixes  the  vertices  of  S  pointwise  is  the  trivial  automor¬ 
phism. 
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Note  that  an  alternate  definition  for  a  determining  set  is  a  set  S  for  which 
whenever  f,g  £  Aut(G)  so  that  f(s)  =  g(s)  for  all  s  €  S,  then  f(v)  =  g{v) 
for  all  v  £  V(G).  That  is,  every  automorphism  is  completely  determined  by  its 
action  on  a  determining  set. 

In  the  above  definitions,  if  t  is  omitted  from  the  notation  (i.e.  identifying 
code  instead  of  f-identifying  code),  then  it  is  assumed  that  t  =  1.  Note  also  that 
these  definitions  have  corresponding  counterparts  for  undirected  graphs. 


3.3  Strings  and  the  Directed  de  Bruijn  Graph 

We  will  be  considering  various  types  of  vertex  subsets  on  the  class  of  directed 
de  Bruijn  graphs.  The  following  definitions  will  be  useful  in  working  with  this 
class  of  graphs.  We  will  use  the  notation  [x]  =  {1,2,... ,  x}. 

Definition  3.15.  Let  A-d  =  {0, 1, . . . ,  d—  1}  and  let  A%  be  the  set  of  all  strings 
of  length  n  made  up  of  letters  of  A.  When  d  is  clear  from  context  we  will  use 
A  and  An  respectively. 

Definition  3.16.  The  directed  de  Bruijn  graph,  denoted  B(d,n),  has  ver¬ 
tex  set  A%.  An  edge  from  vertex  X1X2  . . .  xn  to  vertex  yx 2/2  •  ■  ■  yn  exists  if  and 
only  if  x2x3  . . .  xn  =  y\y2  •  ■  •  !/„  i- 

Definition  3.17.  The  concatenation  of  two  strings  x  =  X1X2...X1  and 
y  =  2/12/2  •  •  •  2 Ik  is  given  by  x  ®  y  =  XiX2  ■  ■  ■  Xiyxy2  ...yk- 

Definition  3.18.  The  concatenation  of  sets  of  strings  S  and  T  is  given  by 
S(BT={x®y\x£S  and  y  £  T}. 

Definition  3.19.  The  prefix  of  a  string  x  =  XiX2-..x„  is  the  substring 
X1X2  . . .  x„_i,  denoted  by  x~ . 

Definition  3.20.  The  suffix  of  a  string  x  =  xxx2  . . .  xn  is  is  the  substring 
X2X3  . . .  xn,  denoted  by  x+. 

Definition  3.21.  When  discussing  substrings  of  a  string  xiX2-..x„,  we  will 
use  the  notation  x(a  :  b)  to  denote  the  substring  xaxa+i . .  .Xf>. 

Definition  3.22.  If  a  string  x  =  X1X2  . . .  xn  contains  a  constant  substring 
x(a,  b)  =  zz...z,  then  we  will  denote  the  consecutive  letters  as  zb~a ,  the  con¬ 
stant  raised  to  the  power  denoting  length.  This  will  also  be  used  for  repeated 
substrings,  such  as  0101 ...  01  =  (01)fe. 

Definition  3.23.  Let  w  =  wx . .  .wn  £  A Define  =  Wi’m^  . . .  Wn’m ^ 

such  that: 

(t,m)  f  Wt+m  (mod  d),  if  i  =  t; 

1  (  Wi,  otherwise. 

Definition  3.24.  Let  w  =  wx . . .  wn  £  A%  and  £  £  Z+  such  that  n  >  21.  Then 
we  say  that  w  has  period  length  l  if  wx  =  Wi+e  for  all  i  £  [n  —  £].  If  we  have 
n  <  2£,  then  we  say  that  w  has  almost  period  length  £. 
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Definition  3.25.  Let  w  £  A^j,  and  suppose  that  w  has  period  length  £,  and 
does  not  have  period  length  k  for  any  k  <  l.  Then  w  is  called  (-periodic. 

Definition  3.26.  Let  w  £  A'J.  If  there  exists  some  (  >  f  and  word  w'  £  A^e~n 
such  that  w  ©  w'  is  ^-periodic,  then  w  is  called  almost  (-periodic. 

We  now  provide  some  lemmas  regarding  string  properties  that  will  be  used 
later  in  the  paper. 

Lemma  3.27.  [10]  Let  ( i  >(2  andw  be  a  word  of  lengthn  >  (\Jr(2—gcd[(\,('i). 
If  w  has  periods  ( or  almost  periods )  of  length  (\  and  (2  ,  then  w  has  a  period  of 
length  gcd((  1,(2)- 

Lemma  3.28.  [4]  Let  (\  >  (2  and  w  be  a  word  of  length  n  >  ( 1  +  (2-  If  w  has 
a  period  (or  almost  period)  of  length  (\  and  w^k,m^  has  a  period  of  length  (2  for 
some  m  £  Ad,  then  there  is  m'  £  Ad  such  that  w^k,m  )  has  a  period  of  length 
gcd((i,(  2). 

Lemma  3.29.  Let  w  £  Ad]  such  that  w  is  (\-periodic  or  almost  (\-periodic.  Let 
m  £  \d  —  1]  and  also  k  £  [n]  with  k  <  n  —  (\  or  k  >  (\ .  Then  for  any  (2  <  § 
with  (\  >  (2  and  n  >  (\  +£2,  It  not  possible  that  is  ( 2-periodic . 

Proof.  We  proceed  by  contradiction,  and  suppose  that  is  ^-periodic.  We 

have  two  cases.  First,  if  k  >  t\,  then  by  Lemma  3.28,  there  exists  some  m!  £  Ad 
such  that  )  has  period  of  length  gcd(ti,^2)-  Then  we  have  the  following 

chain  of  equalities. 

Wk  =  Wk~e !  since  w  has  a  period  of  length  (\ 

=  Wk~e2  since  w^k,m  ^  has  a  period  of  length  gcd(^i,^2) 

=  w[k'm)  since  has  a  period  of  length  (2 

Hence  this  is  a  contradiction.  For  our  second  case,  when  k  <  n  —  (\,  we  note 
the  following. 

( k.m ) 

wk  =  wk+i1  =  Wk+e2  =  Wfc 

This  is  also  a  contradiction.  Therefore  we  must  have  that  w^k,m^  is  not  (2- 
periodic.  □ 

Lemma  3.30.  Let  w  £  A 0  such  that  w  has  period  length  (\  for  some  (\  < 

For  all  m  £  [d  —  1]  and  for  all  i,j,  k  £  [n]  with  i  <  k  <  j ,  and  for  all  (2  <  £\ 
with  j  —  i  +  1  >  £\  +  (2  and  with  either  k  >  i  +  (\  or  k  <  j  —  £\,  we  must  have 
that  w^k’m\i,  j)  does  not  have  period  (2- 

Proof.  Define  w'  =  w(i,j)  and  =  ufk,m\i,j),  and  then  apply  Lemma 

3.29  to  compare  the  two  strings.  □ 

Lemma  3.31.  Let  n  =  2 1  and  let  u  £  A7].  If  u  has  period  length  t  and  for  some 
£  <  t  and  m  £  Ad,  we  find  that  u'  =  u(t,rn^(t  +  1  —  (  :  n  —  1)  is  (-periodic,  then 
we  must  have  that  (  divides  t  and  v!  ®  [un  +  m)  has  period  (. 
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Proof.  First,  we  note  that  Um  =  clearly  has  period  length  t,  and  so 

Um(t+ 1—£  :  n  —  1)  has  almost  period  length  t.  Additionally,  since  Um(t+l  —  £  : 
n  —  1)  =  u' ,  we  know  that  Um(t  +  1  —  £  :  n  —  1)  is  Aperiodic.  Hence  by  Lemma 
3.27,  Um(t  +  1  —  £  :  n  —  1)  has  period  of  length  p  =  gcd(f,  £).  However  since  v! 
is  given  to  be  Aperiodic,  the  minimum  period  length  is  t  and  so  we  must  have 
that  p  =  £  and  thus  £  divides  t. 

To  show  that  u'  ®  (un  +  m)  has  period  £,  we  note  that  v!  has  period  £,  and 
that  u't  =  =  Ut  +  m.  Having  period  £  implies  that  u'k  =  ut  +  m  for  all  k 

that  is  divisible  by  £.  Since  un  +  m  is  the  (t  +  £)th  letter  in  u!  ®  (un  +  in),  and 
this  is  divisible  by  £,  we  need  that  un  +  m  =  ut  +  m  in  order  for  v!  ®  (un  +  m) 
to  have  period  £.  But  this  is  given  to  be  true  since  u  has  period  t.  □ 

The  two  following  lemmas  are  useful  in  working  with  distances  in  B(d,n ) 
and  their  proofs  are  self-evident. 

Lemma  3.32.  In  B(d ,  n )  there  is  a  directed  path  of  length  t  <  n  from  x  to  y  if 
and  only  if  x{t  +  1  :  n)  =  7/(1  :  n  —  t).  That  is,  if  and  only  if  the  rightmost  n  —  t 
letters  of  x  are  the  same  as  the  leftmost  n  —  t  letters  of  y. 

Lemma  3.33.  In  B(d,n )  if  vertices  x  ^  y  have  the  same  prefix,  then  for  all 
u  ^  {x,y} ,d(u,x)  =  d(u,y).  In  particular,  Bf  (x)  \  {x}  =  Bf  (y)  \  {y}  for  all 
t  <  n. 
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3.4  Examples 


Identifying  Code  Locating-Dominating  Set 


Dominating  Set  Distinguishing  Set 


3.5  de  Bruijn  Results 

In  this  section,  we  list  some  general  results  concerning  the  de  Bruijn  graph  that 
were  determined  in  our  search  for  special  vertex  sets. 

Lemma  3.34.  The  strings  in  Bt{x)  for  x  =  X\X^  ■ ■ ■  xn  must  be  one  of  the 
following  three  types. 

1.  x; 

2.  [d]9  ®  Xb-f+i  ■  ■  ■  xn-f  ®  [ d]b~ 9  with  b  >  f,b  >  g,  f  +  b  +  g  <  t; 

3.  [d\f~c  ®  yb+ 1 . . .  yn-f+b  ©  [d]c  with  f  >  b,  f  >  c,b  +  f  +  c  <  t. 

Proof.  All  strings  in  Bt(x)  can  be  described  by  following  forward  or  backward 
edges.  The  strings  of  type  (1)  are  reached  by  taking  no  moves.  All  other  strings 
(types  (2)  and  (3))  are  reached  by  taking  either  moves  of  type  FBF  (forward- 
backward-forward)  or  BFB  (backward- forward-backward) .  We  will  describe 
shortest  paths  within  these  confines.  We  define  /  steps  forward  from  vertex 
X\x2  . . .  xn  as  reaching  vertices  in  the  set: 

[dy  ®  Xi . . .  xn-f. 
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We  define  b  steps  backward  from  vertex  X\X2  ■  ■  -xn  as  reaching  vertices  in  the 
set  : 

Xb+l  •  •  ■  Xyi  ©  [d]  . 

If  FBF  is  the  shortest  path  to  reach  some  vertex  y  from  x,  then  we  must 
follow  /  edges  forward,  b  edges  backward,  and  g  edges  forward,  with  the  con¬ 
straints  that  b  >  /,  b  >  <7,  and  /  +  b  +  g  <  t.  Following  these  sequences,  we 
arrive  at  strings  of  type  (2). 

If  BFB  is  the  shortest  path  to  reach  some  vertex  y  from  x,  then  we  must 
follow  b  edges  backward,  /  edges  forward,  and  c  edges  backward,  with  the 
constraints  that  /  >  b,  f  >  c,  and  b  +  f  +  c  <  t.  Following  these  sequences,  we 
arrive  at  strings  of  type  (3).  □ 

Lemma  3.35.  For  any  y  £  B(d,n)  with  d  >  3,  there  exists  some  vertex  x  such 
that  d(y ,  x)  =  n. 

Proof.  We  proceed  by  induction  on  n  and  show  that  if  the  claim  is  true  in 
B(d,  n )  for  n  >  2,  then  the  claim  is  true  for  B(d,  n  +  2). 

Base  Case:  n  =  2.  Since  d  >  3,  our  vertex  y  =  2/12/2  can  use  at  most  two  sym¬ 
bols  from  our  alphabet.  Suppose  that  z  £  [d] \ {2/1, 2/2}-  Then  d(y,zz)  =  2. 

As  our  induction  proceeds  from  string  length  n  to  n  +  2,  we  require  an 
additional  base  case  of  n  =  3.  If  our  vertex  y  =  2/1  '2/2 2/3  only  uses  two 
distinct  symbols  from  [d],  then  the  string  x  =  an  where  a  £  [d]\{2/i,  2/2, 2/3} 
satisfies  d(y,x)  =  3.  Otherwise,  we  must  have  [d]  =  {2/1, 2/2, 2/3}-  Then  the 
vertex  x  =  (//2)3  satisfies  d(y,x )  =  3. 

Induction  Step:  Let  y  =  yo®y®yn+i  be  arbitrary.  By  the  induction  hypoth¬ 
esis,  there  exists  some  x  £  B(d,n)  such  that  d(x,y)  =  n.  We  will  show 
that  d(y ,  x)  =  n  +  2,  where  x  =  Xq  ©  x  ©  xn+i  with  x$  £  [d]  \  { yn ,  yn+i} 
and  x„+i  £  [d]  \  {2/0, 2/i}-  We  will  show  that  x  ^  Bn+i(y)  using  Lemma 
3.34  and  considering  each  type  of  path  and  resulting  string  individually. 

1.  x  =  y.  Not  possible  since  x  y. 

2.  FBF-type. 

First,  from  Lemma  3.34,  we  know  that  since  d{x ,  y)  =  n  there  cannot 
exist  any  choice  of  /,  6,  g  such  that  f  +  b  +  g<n  —  1,  fo>0,  b  >  /, 
and  b  >  g  such  that 


x  £  [d]9  ©  2/6-/+1  ■  ■  ■  2 ln-f  ©  [d]b  9. 


In  other  words,  we  must  have 

iJb—f+l  ■  ■  ■  Vn—f  7^  %g+ 1  •  •  •  Xg+n— b 

for  all  such  choices  of  /,  b ,  g. 

Now  we  will  show  that  there  does  not  exist  an  FBF-path  of  length 
n  +  1  or  less  between  x  and  y.  Fix  some  /,  6,  g  such  that  f  +  b  +  g  < 
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n  +  1,  b  >  0,  b  >  /,  and  b  >  g.  From  Lemma  3.34  all  vertices 
ZqZ\  . . .  zn+i  that  can  be  reached  by  an  FBF-path  with  parameters 
(/,  b,  g)  from  y  must  have 

Vb—f  ■  ■  ■  Vn—f+l  —  zg  ■  ■  ■  Zg+n+1  —  b- 

(a)  If  /  =  0,  b  =  k,  and  g  =  0,  then  we  consider  1  <  k  <  n  —  1 
and  n  <  k  <  n  +  1  separately.  First,  if  1  <  k  <  n  —  1,  then 
our  induction  hypothesis  with  parameters  (0,  k ,  0)  tells  us  that 
Xi . . .  xn-k  7^  Vk+i  ■  ■  ■  Un  when  we  examine  FBF-paths  with  pa¬ 
rameters  (0,  k,  0)  from  y.  Hence  we  cannot  have  xq  . . .  xn-k+ i  = 
Uk  ■  ■  -Vn+ 1)  and  so  no  such  FBF-path  exists  between  x  and  y. 
Next,  if  n  <  k  <  n  +  1,  then  since  Xq  ^  yn,yn+i,  we  will  never 
have  XqX\  =  ynUn+i  or  xq  =  yn+  i,  and  so  again  no  such  FBF- 
path  exists  in  B(d,  n  +  2). 

(b)  If  /  >  1,  then  we  have  b  >  2.  In  this  case,  in  order  for  such 
an  FBF-path  to  exist  from  y  to  x  we  need  xg  . . .  xg+n-b+ 1  = 
yb-f  ■  ■  ■  yn+i-f-  However  our  induction  hypothesis  with  parame¬ 
ters  (f-l,b-l,g)  shows  xg+1 .  ..xg+n_b+1  ±  yb-f+i  ■  •  ■  2/n-/+ 1, 
and  so  no  such  FBF-path  exists  in  B(d,  n  +  2). 

(c)  If  g  >  1,  then  again  we  must  have  b  >  2.  In  this  case,  in  order 
for  such  an  FBF-path  to  exist  we  must  have  xg  . . .  xg+n-b+  i  = 
yb-f  ■  ■  ■  yn+i-f-  However  our  induction  hypothesis  with  param¬ 
eters  (/,  b  -  l,g  -  1)  tells  us  that  xg  . .  .xg+n-b  7^  yb-f  ■  ■  ■  yn-f , 
and  so  no  such  FBF-path  exists  in  B(d,  n  +  2). 

Hence  we  cannot  have  an  FBF-path  of  length  less  than  n  +  2  between 
y  and  x  in  B(d ,  n  +  2). 

3.  BFB-type. 

First,  from  Lemma  3.34,  we  know  that  since  d(x ,  y)  =  n  there  cannot 
exist  any  choice  of  6,  /,  c  such  that  b  +  f  +  c<n  —  1,  /  >0,  /  >  6, 
and  /  >  c  such  that 

x  G  [ d]f~c  ©  yb+i . . .  yn-f+b  ©  [d]c. 

In  other  words,  we  must  have 

db+l  •  ■  •  Vn—f+b  7^  *£/— c+1  •  *  •  Xn—C 

for  all  such  choices  of  6,  /,  c. 

Now  we  will  show  that  there  does  not  exist  a  BFB-path  of  length  n+1 
or  less  between  x  and  y.  Fix  some  b ,  /,  c  such  that  b  +  f  +  c  <  n  +  1, 
/  >  0,  /  >  6,  and  /  >  c.  From  Lemma  3.34  all  vertices  zqZi  . . .  zn+ 1 
that  can  be  reached  by  a  BFB  path  from  y  with  these  parameters 
must  have 

yb  •  •  •  2/n+l  —  f+b  —  Zf—c  ■  ■  •  Zn-\-\  —  c. 
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Figure  1:  B( 2,3)  does  not  contain  any  vertices  at  distance  3  from  Oil. 

(a)  If  b  =  0,  /  =  k,  and  c  =  0,  then  we  consider  1  <  k  <  n  —  1 
and  n  <  k  <  n  +  1  separately.  First,  ifl<fc<n  —  1,  then  our 
induction  hypothesis  tells  us  that  Xk+i  ■  ■  ■  xn  ^  y\ . . .  yn-k  when 
we  examine  BFB-paths  with  parameters  (0,  k,  0)  from  y.  Hence 
we  cannot  have  Xk  ■  ■  ■  xn+i  =  yo  ■  ■  ■  yn-k+i  in  B(d,  n  +  2),  so  no 
such  BFB-path  exists  between  x  and  y. 

Next,  if  n  <  k  <  n  +  1,  then  since  xn+i  ^  j/o,yi,  we  will  never 
have  xnxn+i  =  yoyi  or  xn+±  =  yo,  and  so  again  no  such  BFB- 
path  exists  in  B(d,  n  +  2). 

(b)  If  b  >  1,  then  we  have  /  >  2.  In  this  case,  in  order  for  such  a 
BFB-path  to  exist  from  x  to  y  we  must  have  Xf-C  . . .  xn+i-c  = 
yb  ■  ■  ■  yn+i-f+b-  However  our  induction  hypothesis  with  param¬ 
eters  (b  -  1,  /  -  1,  c)  tells  us  that  ie/_c  . . .  xn_c  ^  yb  ■  •  •  Vn-f+b , 
and  so  no  such  BFB-path  exists  in  B(d,  n  +  2). 

(c)  If  c  >  1,  then  we  must  have  /  >  2.  In  this  case,  to  have  such 
a  BFB-path  between  x  and  y  we  must  have  Xf-C . . .  xn+i-c  = 
yb  ■  ■  ■  yn+i-f+b-  However  our  induction  hypothesis  with  parame¬ 
ters  (6,/-l,c-l)  shows  Xf-c+i . . .  xn-c+i  ±  yb+i  ■  ■  ■  yn-f+i+b, 
and  so  no  such  BFB-path  exists  in  B(d,  n  +  2). 

Hence  we  cannot  have  a  BFB-path  of  length  less  than  n  +  2  between 

y  and  x  in  B(d,  n  +  2). 

Therefore  there  is  no  path  from  y  to  x  of  length  n  +  1  or  smaller,  and  so 
d(y,x)  >  n  +  2.  As  it  is  well  known  that  the  de  Bruijn  graph  B(d,n  +  2) 
has  diameter  n  +  2  (see  [2]),  we  must  have  d(y,x)  =  n  +  2. 

□ 

In  other  words,  Lemma  3.35  tells  us  the  eccentricity  of  every  node  in  the 
graph  B(d,  n )  is  n  for  d  >  3,  and  so  the  radius  of  B(d,  n)  is  n.  Note  that  when 
d  =  2  this  does  not  always  hold.  For  example,  the  graph  0(2, 3)  does  not  have 
any  vertex  at  distance  3  from  Oil.  See  Figure  1. 
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Results  and  Discussion 


4.1  Dominating  Set  Bounds 

First,  we  recall  the  definition  of  a  dominating  set  in  a  graph. 

Definition  4.36.  A  (directed)  f-dominating  set  is  a  subset  S  C  V(G)  such 
that  for  all  v  €  V(G)  we  have  B]~  (v)  fl  5/0.  That  is,  S'  is  a  (directed) 
f-dominating  set  if  every  vertex  in  G  is  within  (directed)  distance  f  of  some 
vertex  in  S.  We  denote  the  size  of  a  minimum  f-dominating  set  in  a  graph  G 
by  7 t(G). 

Note  that  by  definition  every  identifying  code  is  also  a  dominating  set,  but 
not  conversely. 


4.1.1  Directed  de  Bruijn  Graphs 


We  begin  with  a  review  of  the  current  literature  and  then  proceed  with  our 
results. 


Theorem  4.37.  [15]  For  d  >  2,  n  >  1,  71  (B(d,n)) 


dn 

d+1 


In  [15]  a  construction  of  a  minimum  dominating  set  for  B(d ,  n )  is  given.  Key 
to  this  construction  is  the  fact  that  every  integer  m  corresponds  to  a  string 
(base  d)  in  Z^,  that  we  call  Xm.  The  construction  utilizes  a  special  integer  m 
defined  by: 


f  dn  2  +  dn  4  +  •  •  •  +  dn  2k  +  ■  ■  ■  +  d?  +  1  mod  dn ,  if  n  is  even; 

m  =  \  dn~2  +  dn~4  +  •  •  •  +  dn~2k  +  ■  ■  ■  +  d3  +  d  mod  dn ,  if  n  is  odd. 

Let  D  =  {m,m  +  1,  ...,m  +  r^px]  —  1}-  Now  let  S  be  the  set  of  strings 

{Xi  |  i  £  D }.  Then  S'  is  a  minimum  size  dominating  set  for  B(d,n). 

Next  we  provide  constructions  for  f-dominating  sets.  While  others  have 
considered  some  variations  of  t-dominating  sets  (such  as  perfect  dominating 
sets  in  [18]),  it  does  not  appear  that  the  general  f-dominating  sets  have  been 
considered  in  the  directed  de  Bruijn  graph. 
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Theorem  4.38.  The  set  S  U  {0"}  where 
S  =  {x  £  |  Xk(t+ 1)  7^  0  for  some  k  £  Z+  and  Xi  =  0  for  all  i  <  kit  +  1)} 

is  a  t-dominating  set  of  size 


1  +  1) 


/  i-cr(t+1)Li¥rJ\ 

^  1  -  dAt+i)  J 


in  B(d ,  n). 

Proof.  Let  x  be  a  vertex  in  A%.  Assume  that  there  are  k  zeros  at  the  beginning 
of  x,  but  not  k  +  1  zeros,  i.e.  x  =  0”  or  x  =  0fc  ®  a  ®  x(k  +  2  :  n)  for 

some  a  /  0.  Let  l  €  [0,  t]  be  an  integer  so  that  k  +  l  =  t  (mod  t  +  1),  i.e. 

k  +  l  =  t  +  m(t  +  1)  =  (to  +  l)(t  +  1)  —  1  for  some  m  >  0.  Now 

0l  ©  x(l  :  n  -  l)  =  0l  ©  0fe  ©  a  ©  x(k  +  2  :  n  -  l)  =  0fc+/  ©  a  ©  x(k  +  2  :  n  -  l) 


belongs  to  S  except  if  k+l  >  n.  If  k+l  >  n,  then  0”_fcffi:r(l  :  k)  =  0"  £  S’UjO"} 
dominates  x.  Therefore  every  vertex  is  dominated  by  S  U  {0"}. 

There  are  dn~k(t+1)  ■  (d  —  1)  vertices  which  begin  with  exactly  kit  +  1)  —  1 
zeros.  Moreover,  every  vertex  of  S\  {0"}  begins  at  most  n  —  1  zeros.  This  needs 


that  k(t  +  1)  <  n  or  1  <  fc  <  .  Finally, 

S  U  {0”}.  Therefore  the  size  of  S  U  {0"}  is 


0”  is  added  to  the  dominating  set 


LiflJ 

1  +  J2  dn~i{t+1)  ■  {d  -  1) 
2—1 

im 

=  i+dn(d-i) 


2=1 


itfrj 


=  i+dn{d-i)  -i+ 


2=0 


1  -  (d_t_1) 


-t- IV  LtTrJ+1' 


=  l  +  dn{d-  1)  |  -1  + 

V(<+i)  -  (d-(t+v) 


1  -  d-it+p 

LtrrJ+1' 


=  l  +  dn(d-  1) 


=  l  +  d^-^d-  1) 


1  -  d~(t+i) 

1  -  d_d+1)LtfrJ 
1  -  d-(*+ 1) 


□ 
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This  result  gives  us  the  following  lower  bound  on  the  size  of  a  t-dominating 
set  in  B(d,  n ). 

Theorem  4.39.  Bounds  on  the  size  of  a  t-dominating  set  in  B(d,n)  are  given 
by: 


1 


7 t(B(d,n))  > 


fd-1) 


i-d-(t+1,Lif  J 

l-d-o+y 


Jn—t—  1 


(d~  1) 


L_,r 

1  —  d-(t+1) 


if  n  =  t  (mod  t  +  1) 
otherwise. 


Proof.  Suppose  that  the  set  S'  is  a  t-dominating  set  in  B(d,n).  Choose  a  € 
„4d\{0}  and  i  G  Z  so  that  i  <  and  w  G  Ad~dt+1\  Let  x  =  toffia©04t+1)_1. 
We  note  that  Bf(x)  contains  the  following  elements. 


B. 


'0*0  =  { 


=  <y  \  y  =  W 


I  0 dt+il-i-fc  for  some  k  G  [0 ,t],w'  G  Akd} 


Note  that  for  all  v  x  such  that  v  =  v'  ©  b  ©  1  with  b  G  [d  — 

1] ,  j  <  -jff. j-,  and  v'  G  Ad  ^t+1\  we  must  have  that  Bf(x)  fl  Bf(v)  =  0. 
Hence  each  of  these  types  of  strings  must  dominated  by  a  different  element 
of  S',  and  so  we  must  have  the  following  lower  bound  on  \S’\.  Define  A  = 

{v\v  =  v'  ©  b®  (P(t+1)-1  with  b  G  [d  -  1  ],j  <  ^T,u/  G  -4d_j(t+1)}- 

l-S'l  >  W 

L*%'J 

=  d”_i(t+1)  •  (d  -  1) 

i= i 

.  1  /i  -  \ 


Finally,  we  consider  the  string  0"  and  note  that 

0”)  =  {z  |  z  =  z'  ©  0"_s  with  z'  G  „4d,  s  <  t}. 

When  we  compare  Bf{ 0")  with  Bf(x),  we  note  that  since  a  /  0  we  must  have 
that  the  closest  element  of  Bf(x)  to  0"  is  x  itself.  Next  we  note  that  the  string 
closest  to  0"  in  the  set  A  will  occur  when  j  =  |_ J  ■  This  will  give  us  the  string 
with  the  most  0’s  packed  at  the  right  end.  Finally,  if  n  =  p  mod  t  +  1,  then  this 
string  looks  like  v'  ©  b  ©  0"_p_1  with  v'  G  Ad  and  b  ^  0.  If  p  =  t,  then  we  are 
still  unable  to  reach  0",  and  so  we  must  have  at  least  one  additional  string  in 
S'  to  cover  0".  □ 


4.1.2  Undirected  de  Bruijn  Graphs 


Lemma  4.40. 


dom(S(d, n))  > 


^2 — ,  if  d  is  even; 

if  d  is  odd. 
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Proof.  Every  vertex  v  £  V(B(d,n))  can  cover  at  most  2d  +  1  vertices.  Since 
there  are  dn  vertices  in  the  graph,  this  provides  the  following  lower  bound. 


dom(6(d,  n)) 


dn 


2d+l 


if  d  is  even; 
if  d  is  odd. 


O 


4.2  Automorphisms,  Resolving  Sets,  and  Deter¬ 
mining  Sets 


4.2.1  Resolving  Sets 

Definition  4.41.  A  directed  resolving  set  is  a  set  S  such  that  for  all 
u,v  £  V(G)  there  exist  s  £  S  so  that  d(s,u)  ^  d(s,v).  The  directed  metric 
dimension  is  the  minimum  size  of  a  directed  resolving  set.  An  example  of  a 
directed  resolving  set  in  6(2,  3)  is  given  in  Figure  2. 


Note  that  this  definition  is  not  quite  the  same  as  that  given  in  [8]  (which 
requires  that  there  exist  s  £  S  so  that  d(u,s)  ^  d(v,s)).  Our  definition  cor¬ 
responds  better  to  the  definitions  of  domination  and  of  identifying  codes  for 
directed  graphs  that  are  used  in  this  paper. 

Theorem  4.42.  The  directed  metric  dimension  for  B(d,n)  is  d1l~1(d—  1). 


Proof.  The  following  shows  that  for  each  w  £  A™-1  a  directed  resolving  set  for 
6(d,  n)  must  contain  (at  least)  all  but  one  of  the  vertices  with  prefix  w.  Suppose 
that  w  £  An~ 1,  and  i  ^  j  £  A  so  that  neither  of  w  ®  *,  w  ©  j  is  in  our  set  S. 
Note  that  if  x,y  £  V(B(d1n)),  with  x  ^  y,  then  the  distance  from  x  to  y  is 
completely  determined  by  x~  (and  y+).  Since  neither  w  ®  *  nor  w  ®  j  is  in  S , 
and  both  have  the  same  prefix,  d(w  ©  i,  x)  =  d{w  ©  j,  x)  for  all  x  £  S.  Thus  S 
is  not  a  directed  resolving  set.  Thus  for  every  w  £  A”-1,  S  must  contain  (at 
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Figure  3:  A  minimum  size  determining  set  for  6(2,3)  (black  vertex). 


least)  all  but  one  of  the  strings  w  ©  j  for  j  £  A.  Thus  \S\  >  dn~l(d  —  1).  Since 
{w  ©  0  |  w  £  A"-1}  can  easily  be  shown  to  be  a  directed  resolving  set,  we  have 
the  desired  equality.  □ 

The  combination  of  Theorem  4.54  and  Theorem  4.2.1  yields: 

Corollary  4.43.  The  directed  metric  dimension  for  B{d,n)  is  equal  to  the  min¬ 
imum  size  of  a  t-identifying  code  for  B(d,  n )  if  2 1  <  n. 

4.2.2  Automorphisms  and  Determining  Sets 

In  this  section  we  will  use  a  determining  set  to  help  us  illustrate  the  auto¬ 
morphism  group  of  B(d ,  2),  study  the  relationship  between  Aut(6(d,  n—  1))  and 
Aut (B(d,  n))  and  use  the  result  to  find  the  determining  number  for  each  B(d,  n ). 
First  let’s  recall  some  definitions. 

Definition  4.44.  An  automorphism  of  a  graph  G  is  a  permutation  n  of 
the  vertex  set  such  that  for  all  pairs  of  vertices  it,  v  £  V(G),  uv  is  an  edge 
between  u  and  v  if  and  only  if  n(u)n(v)  is  an  edge  between  tt(u)  and  n(v).  An 
automorphism  of  a  directed  graph  G  is  a  permutation  7 r  of  the  vertex  set 
such  that  for  all  pairs  of  vertices  u,v  £  V(G),  uv  is  an  edge  from  it  to  v  if  and 
only  if  tt(u)tt(v)  is  an  edge  from  ir(u)  to  One  automorphism  in  the  binary 

(directed  or  undirected)  de  Bruijn  graph  is  a  map  that  sends  each  string  to  its 
complement. 

Definition  4.45.  [6]  A  determining  set  for  G  is  a  set  S  of  vertices  of  G 
with  the  property  that  the  only  automorphism  that  fixes  S  poiutwise  is  the 
trivial  automorphism.  The  determining  number  of  G,  denoted  Det(G)  is  the 
minimum  size  of  a  determining  set  for  G.  See  Figure  3  for  an  example. 

Note  that  an  alternate  definition  for  a  determining  set  is  a  set  S  with  the 
property  that  whenever  f,g  £  Aut(G)  so  that  /(s)  =  g(s)  for  all  s  £  S,  then 
f(v)  =  g(v)  for  all  v  £  V(G).  That  is,  every  automorphism  is  completely 
determined  by  its  action  on  a  determining  set. 

Notice  that  since  for  both  directed  resolving  sets  and  for  identifying  codes, 
since  each  vertex  in  a  graph  is  uniquely  identified  by  its  relationship  to  the  subset 
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by  properties  preserved  by  automorphisms,  the  subset  it  also  a  determining  set. 
Thus  every  directed  resolving  set  and  every  identifying  code  is  a  determining  set. 
However,  though  domination  is  preserved  by  automorphisms,  vertices  are  not 
necessarily  uniquely  identifiable  by  their  relationship  to  a  dominating  set.  Thus 
a  dominating  set  is  not  necessarily  a  determining  set.  However,  the  relationships 
above  mean  that  the  size  of  a  minimum  determining  set  must  be  at  most  the  size 
of  a  minimum  identifying  code  or  the  directed  metric  dimension.  For  de  Bruijn 
graphs  we  have  shown  that  the  latter  numbers  are  rather  large.  Does  this  mean 
that  the  determining  number  is  also  large.  We  will  see  in  Corollary  4.49  that 
the  answer  for  directed  de  Bruijn  graphs  is  a  resounding  ‘No’. 

Lemma  4.46.  S  =  {00, 11, 22, 33, . . . ,  (d  —  l)(d  —  1)}  is  a  determining  set  for 
B(d,  2). 

Proof.  Suppose  that  a  £  Aut(Z3(d,  2))  fixes  S  pointwise.  That  is,  a(ii)  =  ii 
for  all  i  £  A.  Choose  ij  ^  rs  £  V(B(d,  2)).  Then  either  i  /  r  or  j  ^  s  (or 
both).  If  i  ^  r  then  d(ii,rs)  =  2  which  is  distinct  from  d(ii,ij)  =  1-  Since  an 
automorphism  of  a  directed  graph  must  preserve  directed  distance,  <j(ij )  ^  rs 
if  i  ^  r.  If  j  ^  s,  then  d(rs,jj )  =  2  which  is  distinct  from  d(ij,jj )  =  1. 
Thus,  again  using  that  cr  preserves  directed  distance,  cr(ij)  ^  rs  if  j  ^  s.  Thus, 
cr(ij)  =  ij  for  all  ij  £  V(B(d ,  2))  and  therefore  a  is  the  identity  map  and  S'  is  a 
determining  set.  □ 

Note  that  we  are  using  directed  distances  both  from  and  to  elements  of  the 
set  S.  Thus  S  does  not  fit  the  definition  of  a  directed  resolving  set  for  B(d,  2)  (by 
[8] ,  this  would  require  that  each  vertex  v  £  V ( G )  be  distinguished  by  it  directed 
distance  to  the  vertices  of  the  resolving  set) .  However  directed  distances  both  to 
and  from  a  set  can  be  used  in  determining  automorphisms  of  a  directed  graph. 

Lemma  4.47.  Aut(B(d, 2))  =  Sym(„4c;). 

Proof.  Let  a  £  Sym(Ad).  Define  <pa  on  V(B(d,n))  by  applying  a  to  each  vertex 
coordinate- wise.  That  is  ipa(ab)  =  a(a)a(b).  It  is  easy  to  show  that  ipa  preserves 
directed  edges  and  thus  is  an  automorphism.  Further,  distinct  permutations 
in  Sym(Ad)  produce  distinct  automorphisms  since  they  act  differently  on  the 
vertices  of  the  determining  set  S  defined  above.  Thus  we  have  an  injection 
Sym (Ad)  Aut (B(d,  2)). 

Since  the  vertices  of  S  are  precisely  the  vertices  with  loops,  every  automor¬ 
phism  of  B(d ,  2),  must  preserve  S  setwise.  This  provides  the  necessary  injection 
from  Aut(B(d,  2))  e— ^  Sym(A).  Thus,  Ant(B(d,  2))  =  Sym(Ad).  □ 

Note  that  we  can  consider  the  automorphisms  of  B(d,  2)  as  permutations 
of  the  loops,  but  we  can  simultaneously  consider  them  as  permutations  of  the 
symbols  in  the  alphabet  Ad-  It  can  be  useful  to  view  the  automorphisms  in 
these  two  different  ways. 

Note  that  as  shown  in  [1],  B(d ,  n)  can  be  built  inductively  from  B(d,n—1)  in 
the  following  way.  The  vertex  x\  . . .  xn  £  V ( B(d ,  n))  corresponds  to  the  directed 
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edge  from  x\  . . .  xn-\  to  X2  ■  •  ■  xn  in  B(d,n —  1).  The  directed  edge  B(d ,  n)  from 
x\...xn  —>  X2  ■  ■  -  Xn.Xn+i  corresponds  to  the  directed  2-path  xi...xn-i  — > 
X2---Xn  — >  X3 . . .  xn-t-i  in  B(d,n  —  1).  That  is,  B(d,n)  is  the  directed  line 
graph  of  the  directed  graph  B(d,n—1).  Thus,  by  [13]  (Chapter  27,  Section  1.1), 
Aut (B(d,  n  —  1))  =  Aut (B(d,  n ))  =  Sym(A).  In  the  following  paragraphs  we  see 
detail  this  correspondence. 

Suppose  that  p  £  Aut(£J(d,  n)).  Since  p  preserves  directed  edges,  we  know 
that  both  p(x  1 . . .  xn)  =  a± . .  .an  and  p{x2  ■  ■  ■  in+i)  =  b\. .  ,bn  if  and  only 
if  a2  =  61, . . . ,  an  =  bn_  1.  Thus  if  p{x\ . . .  xn-ixn)  =  a  1 . . .  a„_i an  then  for 
every  b  £  A,  p{x\ . . .  xn-\z)  =  a± . . .  an_i c  for  some  c  £  A.  In  particular,  this 
allow  us  to  define  an  automorphism  ip'  £  Aut (B(d,n—  1))  corresponding  to  ip  £ 
Aut (B(d,  n)).  Define  p'  by  ip'(x  1 . . .  xn-i)  =  a± . . .  an-\  where  p{x\ . . .  xn)  = 
ai . .  .an- 1.  By  the  preceding  discussion,  p’  is  well-defined.  It  is  also  clearly  a 
bijection  on  vertices  of  B(d ,  n  —  1).  Consider  x\ . . .  xn-\  and  X2  ■  ■  ■  xn-ixn ,  the 
initial  and  terminal  vertices  of  a  directed  edge  in  B{d,n —  1).  Since  p  preserves 
directed  edges  if  p{x  1 . . .  xn)  =  a\ . . .  an_ian  then  for  any  z  £  A,  p(x2  ■  ■  ■  xnz)  = 
a 2  . . .  anw  for  some  w  £  A.  By  definition  of  p' ,p'(xi, . . .  xn-i)  =  a\ . . .  ara_i 
and  p'(x2  ■  ■  ■  xn)  =  a 2  . . .  an.  Thus  p'  preserves  the  directed  edge.  Thus  we  get 
Aut {B(d,  n ))  ^  Aut (B(d,  n  —  1)). 

In  the  other  direction,  suppose  we  are  given  p’  £  Aut (B{d,n  —  1)).  Since 
p '  preserves  directed  edges,  and  directed  edges  of  B(d,n  —  1)  are  precisely  the 
vertices  of  B(d,  n),  p1  defines  a  map  on  vertices  of  B(d,n).  That  is,  (with  some 
abuse  of  notation) 


p{xi 

H 

3 

II 

H 

.  .  Xn-!  -)•  X2  ■  ■  ■  Xn) 

=  p\xi . 

.  .  .  X„_i  X2  ■  ■  ■  xn) 

=  p'(xi 

...xn-i)  -P  p'{ x2...xn). 

Thus,  given  p'(x\ . . 

•  xn— 1) =  a  1 . . . an- 

.1  then  p'{x2  ■  ■  ■  xn)  =  02  ■  ■  ■  an  for  some 

an  £  A  and  we  define  p{x\...xn)  =  a\...an.  Further,  since  p 1  preserves 
directed  2-paths,  p  preserves  directed  edges.  Thus  we  get 

Aut  (B(d,  n-l))’-t  Aut  {B{d,  n)). 

Since  the  automorphisms  of  B(d ,  2)  are  permutations  of  the  loops,  and  of 
the  symbols  of  A,  by  induction,  so  are  the  automorphisms  of  B(d,n )  for  all  n. 
Thus  we  have  proved  the  following. 

Theorem  4.48.  Aut (B(d,n))  =  Sym(Ad)  for  all  n>  2. 

Corollary  4.49.  Det {B(d,n))  =  . 

Proof.  Let  S'  be  a  minimum  set  of  vertices  in  which  each  letter  of  Ad- 1  occurs  at 
least  once.  It  is  easy  to  see  that  |S|  =  |" .  Any  permutation  of  Ad  that  acts 
nontrivially  on  any  letter  of  Ad  must  act  non-trivially  on  any  string  containing 
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that  letter.  Thus  if  er  £  PtStab(S'),  then  a  must  fix  every  letter  contained  in  any 
string  in  S.  Thus  a  fixes  0, 1, . . . ,  d  —  1  and  therefore  also  d.  We  can  conclude 
that  a  is  the  identity  in  both  Sym(Arf)  and  in  Aut (B(d,n)  and  therefore  S  is  a 
determining  set.  Thus  Det (B(d,n))  < 

Further  if  | then  fewer  than  d  —  1  letters  of  Ad  are  used  in  strings 
in  S.  If  a,  b  £  Ad  are  not  represented  in  S,  then  the  transposition  (a  b )  in 
Sym(Ad)  is  a  non-trivial  automorphism  of  B{d,n )  that  fixes  S  pointwise.  Thus 
S  is  not  a  determining  set.  □ 

Thus  for  directed  de  Bruijn  graphs,  the  determining  number  and  the  directed 
metric  dimension  can  be  vastly  different  in  size. 


4.3  Identifying  Codes 

We  have  a  few  general  results  that  hold  true  for  any  graph.  The  first  result  will 
be  used  frequently  in  proving  the  existence  of  t-identifying  codes  in  a  graph. 

Definition  4.50.  Two  vertices  u,  v  £  V ( G )  are  called  t-twins  whenever  B f  (u)  = 
Bff  [y).  If  the  graph  has  no  t-twins,  then  G  is  called  t-twin-free. 

Theorem  4.51  ([7]).  For  a  given  graph  G  and  integer  t,  G  has  a  t-identifying 
code  if  and  only  if  it  is  t-twin-free. 

Next,  we  prove  some  inductive  relationships  that  exist  with  identifying  codes 
that  are  useful. 

Theorem  4.52.  If  Q  is  t-identifiable,  then  it  is  also  (t  —  1) -identifiable. 

We  will  prove  this  result  using  the  converse  of  the  following  lemma. 

Lemma  4.53.  Suppose  that  {*,?/}  are  t-twins  in  Q.  Then  we  must  also  have 
that  { x ,  y}  are  (t  +  1) -twins. 

Proof.  First,  we  note  that  if  {x,y}  are  t-twins,  then  d(x,y)  <  f,  so  x  and  y 
must  be  in  the  same  component  C  of  Q.  If  Bt{x )  =  Bt{y )  =  C,  then  clearly 
x  and  y  are  (t  +  l)-twins,  as  there  are  no  vertices  2  such  that  d(x,  z)  =  t  +  1 
or  d(y,z)  =  t  +  1.  However,  if  Bt( x)  C  C ,  then  there  exists  some  /3  £  C 
such  that  d(x,/3 )  =  t  +  1.  We  know  that  ft  must  have  some  neighbor  a  such 
that  d(x,a)  =  t.  Note  that  since  d(x,/3)  >  t  and  Bt(x)  =  Bt(y),  we  also 
must  have  d(y,/3)  >  t  and  d(y,a)  <  t.  Since  d(a,  (3)  =  1,  this  implies  that 
d(y,/3)  =  t  +  1  and  hence  we  have  j3  £  Bt+i(y).  Since  /3  was  arbitrary,  this 
implies  that  Bt+  \{x)  C  Bt+i(y).  The  same  argument  follows  with  x  and  y 
reversed  to  show  that  Bt+\{y)  C  Bt+i(x),  and  hence  Bt+i(x)  =  Bt+i(y)  and 
we  have  that  { x ,  y}  are  (t  +  l)-twins  in  Q .  □ 

We  note  that  in  the  literature  others  have  considered  various  classes  of 
graphs,  such  as  interval  graphs  and  permutations  graphs  [12],  however  the  vast 
majority  of  these  results  do  not  pertain  to  de  Bruijn  networks  in  particular. 
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While  these  classes  of  graphs  do  not  contain  the  class  of  de  Bruijn  graphs,  the 
class  of  undirected  line  graphs  is  considered  in  [11].  Due  to  the  recursive  nature 
of  the  de  Bruijn  graphs,  it  is  clear  that  they  are  indeed  line  graphs,  and  these 
results  will  be  discussed  further  in  Section  4.3.2. 

4.3.1  Directed  de  Bruijn  Graphs 

We  begin  by  considering  directed  de  Bruijn  graphs.  Similar  to  the  results  found 
for  dominating  sets,  the  directed  graphs  have  much  simpler  and  obvious  identi¬ 
fying  code  constructions. 

Theorem  4.54.  If  B(d,n)  is  a  t-identifiable  graph,  then  the  size  of  any  t- 
identifying  code  is  at  least  dn~1(d  —  1). 

Proof.  Choose  t  <  n  and  a  /  6  in  A.  Suppose  that  for  some  w  £  A n_1,  neither 
x  =  w(B a  nor  y  =  w®  b  is  a  set  S.  Since  x  and  y  share  a  prefix,  by  Lemma  3.33, 
Bf(x)  \  {x}  =  Bf(y)  \  {y}.  Since  neither  x  nor  y  is  in  S,  IDs(x)  =  ID^y). 
Thus  S  is  not  a  t-identifying  code.  Thus  for  each  w  £  A n_1,  a  t-identifying 
code  must  contain,  at  least,  all  but  one  of  w  ©  a  for  a  £  A.  Thus  a  t-identifying 
code  for  B(d,n )  must  have  size  at  least  dn~1(d—  1).  □ 

Note  that  the  result  above  is  independent  of  the  radius  t.  An  interesting  con¬ 
sequence  of  this  is  the  fact  that  increasing  the  radius  of  our  identifying  code  does 
not  produce  any  decrease  in  the  size  of  a  minimum  identifying  code.  For  exam¬ 
ple,  consider  the  potential  application  of  identifying  codes  in  sensor  networks. 
One  might  think  that  by  increasing  the  sensing  power  (which  corresponds  to  the 
radius  of  the  identifying  code)  we  would  be  able  to  place  fewer  sensors  and  thus 
incur  a  savings  overall.  However,  Theorem  4.54  implies  that  providing  more 
powerful  (and  thus  more  expensive)  sensors  does  not  allow  us  to  place  fewer 
sensors.  Thus  we  should  use  sensors  that  have  sensing  distance  equivalent  to 
radius  one.  In  fact,  in  the  case  of  2-identifying  codes  in  B( 2,3),  we  actually 
require  an  extra  vertex  for  a  minimum  size  of  seven! 

The  remainder  of  this  section  is  organized  as  follows.  We  first  provide  a 
construction  of  an  optimal  t-identifying  code  for  B{d ,  n)  with  t  >  2,  and  n  >  2t 
in  Theorems  4.55  and  4.56.  Following  the  proof  of  this  result,  we  highlight  some 
variations  that  provide  identifying  codes  for  several  other  instances.  Finally,  we 
highlight  an  alternative  construction  for  1-identifying  codes  for  all  B(d,n)  when 
d  2  and  n  is  odd. 

Theorem  4.55.  Suppose  that  n  >  5,d  >  2,t  >  2,  and  n  >  2 1.  Then  the 
following  set  S  is  an  optimal  t-identifying  code  of  size  dn_1(d  —  1)  in  B(d,n). 

S  =  jx  £  A%  |  for  some  m  and  l<t,  x^t,m\t  +  1  —  i  :  n  —  1)  is  I- periodic , 

but  x^t,m\t  +  1  —  £  :  n  —  1)®  ( xn  +  m)  is  not .  j 
U 

jx  £  Afi  |  xt^xn  and  x^’^lt  +  1  —  I :  n  —  1) 

is  not  I-periodic  for  any  m  and  t  <  t} 
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Proof.  First,  let  us  note  that  for  all  x  £  A%,  if  +  1  —  £  :  n)  has  period  £ 

for  any  m  and  l  <  t,  then  x  £  S. 

Let  x  be  given,  and  define  Xi  to  be  the  ith  coordinate  of  x.  We  also  define 
the  following  string  Xij. 

!i,  if  k  =  t; 

j,  if  k  =  n; 

x(k),  otherwise. 

In  other  words,  x^  is  the  string  X\ . . .  Xt-iixt+i  .  ■  ■  xn-\j. 

Next,  we  note  that  Xij(t  +1—  l:n  —  1)=  x^t,%~Xt'> {t  +  1  —  i  :  n  —  1),  which 
implies  by  Lemma  3.30  that  Xij(t  +  1  —  t  :  n  —  1)  is  Aperiodic  for  at  most  one 
t  <t  =  %  and  for  at  most  one  i  £  Ad- 

Next,  suppose  that  xai >(t  +  1  —  t :  n)  is  Aperiodic  and  consider  x^.  We  note 
that  (t  +  \  —  i:n—\)  =  xat(t  +  1  —  l :  n  —  1),  which  is  ^-periodic.  Hence 

x^  is  a  member  of  S  if  and  only  if  xffa  l\t\  —  i  :  n  —  1)  ®  {j  +  a  —  i)  is  not 
^-periodic.  In  other  words,  if  and  only  if  *  —  j  ^  a  —  b  mod  d.  On  the  other 
hand,  if  x^j  {t  +  1  —  t  :  n—  1)  is  not  ^-periodic  for  any  £  and  i  £  Ad,  then  x^j  £  S 
if  and  only  if  i  ^  j.  Hence,  there  is  exactly  one  j  £  Ad  such  that  xaj  &  S, 
and  similarly  there  is  exactly  one  i  £  Ad  such  that  x.n,  ^  S.  These  pairings  tell 
us  that  cf1-1  strings  are  not  in  S ,  leaving  the  cardinality  of  S  at  dn  —  dn_1,  or 
dn~1(d  —  1). 

Now  that  we  have  established  the  cardinality  of  S,  we  must  show  that  no 
two  nodes  have  the  same  identifying  sets.  Let  x7  y  £  B(d,n ),  and  consider  their 
identifying  sets,  called  I(x)  and  I(y),  respectively.  Let  k  be  the  smallest  index 
such  that  Xk  ^  Dk- 


k  = 


1:  Without  loss  of  generality,  we  may  assume  that  x\  =  0  and  y\  =  1. 
Observe  that  we  have  the  following  strings  contained  in  the  identifying 
sets. 


x’ 

=  0i_1  ©  0  ®  x(l  :  n  -  t) 

£ 

Bt  (x), 

x" 

=  0f_1  ®  1  ®  x(l  :  n  —  t) 

£ 

Bt~(x), 

y' 

=  li_1  ®  1  ®  y{  1  :n  —  t ) 

£ 

Bf(y),  and 

y" 

=  1*_1  ffi  0  ®  y(l  :  n  -  t) 

£ 

Bt~(y). 

Note  that  x'  £  Bf{y)  and  y'  £  Bf(x),  since  Bf{x)  does  not  contain 
any  vertices  beginning  with  1<+1  and  Bf  ( y )  does  not  contain  any  vertices 
beginning  with  0t+1.  Next,  we  notice  that  at  least  one  {a/,  a;"}  is  in  S. 
To  see  this,  we  note  that  x"  =  a/^’1).  By  the  same  point,  we  must  have 
that  at  least  one  of  y',  y"  is  a  member  of  S. 

Next,  we  note  that  if  at  least  one  of  x\y'  is  a  member  of  S,  we  can 
use  that  string  to  separate  y.  Otherwise,  if  either  x"  £  Bf(y)  0  S  or 
y"  Bf(x)  D  S,  we  can  separate  x  and  y  with  the  given  string.  As  a 
last  resort,  we  consider  the  case  in  which  we  have  x"  £  Bf(y)  0  S  and 
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y"  €  Bt  (x)  (1  S.  Then  we  must  have: 

x"  =  0*_1  ®  y(l  :  n  —  t  +  1), 
y"  =  lt_1  ®  a:(l  :  n  —  t  +  1), 

as  x "  is  the  only  1  in  x''(l  :  t  +  1)  and  y”  is  the  only  0  in  y"{  1  :  t  +  1). 
From  this,  we  get  the  following  string  equalities: 

x(l  :n  —  t)  =  y(2:n  —  t  +  l), 

7/(1:  n  —  t)  =  x(2:n  —  t+l). 

From  these  string  equalities,  we  see  that  Xi  =  yi+\  =  Xi+2  and  yi  = 

Xi+\  =  yi+2  for  all  i  =  1,2, ...,  n  —  t  —  1,  and  so  a:(l  :  n  —  t)  and  y(  1  :  n  —  t) 

are  both  2-periodic  (since  X\  ^  y\  we  cannot  have  1-periodic).  Hence, 
x"(t  —  1  :  n)  =  0  ®  1  ®  x(l  -.n  —  t)  is  also  2-periodic  (recall  that  x\  =  0 
and  7/i  =  l),  so  x"  ^  S.  Hence  we  must  have  x'  €  S,  and  so  we  may  use 
x'  to  separate  x  and  y. 

2  <  k  <  n  —  t:  We  know  that  there  must  exist  some  s  such  that  x\ . . .  a:s_i  = 
7/i . .  .7/s— i,  and  these  substrings  are  constant.  Without  loss  of  generality 
we  may  assume  that  X\ . . .  xs_i  =  0s-1  =  y\ . . .  ys-i  and  xs  =  1,  and  so 
2  <  s  <  k.  Define  the  following  strings. 

7 /  =  1*  ®  7/(1  :  n  —  t)  and 

y"  =  lt_1  ®  0  ®  7/(1  :  n  —  t) 

As  we  saw  in  the  previous  case,  we  have  that  {y' ,y"}  C  B^~(y)  and 
{y\y"}  H5  /  0.  Now  consider  an  arbitrary  vertex  v  G  BtT(x).  Since 
xs-\xs  =  01  and  2  <  s  <  k  <  n  —  t,  we  know  that  v(i  —  1  :  i)  =  01  for 
some  ?'  G  [s, s  +  t\. 

Additionally,  we  consider  y'  and  i  G  [2,  s  + 1  —  1].  For  i  <  t,  we  know  that 

y'(i  —  1  :  i)  =  11,  and  for  i  =  t  +  1,  we  have  that  y'{i  —  1  :  i)  =  10,  and 

finally  for  t  +  2<i<s  +  t—  lwe  have  y'(i  —  1  :  i)  =  00.  Similarly,  for 
i  G  [2,s  +  t  —  1],  we  must  have  y"(i  —  1  :  i)  G  {00, 10, 11}.  This  implies 
that  t/(1  :  s  + 1  —  1)  and  t/"(1  :  s  + 1  —  1)  do  not  contain  the  substring  01, 
and  so  if  y'  or  y"  is  a  member  of  B^~(x),  we  must  have  either  d(y',  x)  =  t 
or  d(y",x)  =  t,  respectively.  Hence  we  must  have  y’t+i  =  a or  y”+i  =  Xi 
for  i  G  [l,n  —  f],  and  therefore  that  Xk  =  y't+k  =  Vt+k  =  2/fei  which  is  a 
contradiction.  Thus  neither  7/  nor  y"  can  be  a  member  of  B^~(x),  and 
hence  both  strings  separate  x  and  y. 

k  >  n  —  t:  Since  we  must  have  sq  =  7/1,  we  may  assume  without  loss  of  gener¬ 
ality  that  ii  =  j/i  /  0.  Define  the  following  strings. 

u  =  0n~k  ®  x{l  :  k)  and 
v  =  0n~k  ®  7/(1  :  k) 
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Clearly  we  have  u  €  (x)  and  v  G  B^(y).  Additionally,  we  have  u(  1  : 

n—  l)  =  i>(l:n  —  1)  and  un  =  Xk  yu  =  vn.  Note  that  this  implies  that 
for  a  =  Xt-n+k  we  have  u  =  ua,Xk  and  v  =  ua,Vk  with  Xk  ^  yk-  By  our 
argument  at  the  very  beginning  of  the  proof,  at  most  one  of  these  can  lie 
outside  of  S,  and  so  we  must  have  {u,  v}  fl  S  /  0. 

We  now  have  two  cases.  First,  if  both  u  ^  B^  (y)  and  v  B^(x),  then 
any  string  from  {it,  v}  D  S  separates  x  and  y.  and  we  are  done.  Otherwise, 
assume  without  loss  of  generality  that  u  G  B^(yj).  Then  we  must  have 
u  =  w  ®  y(l  :  p)  for  at  least  one  p  G  [n  —  t,  n].  Take  p  to  be  the  largest 
such  p  possible.  Since  yi  ^  0  =  14  for  all  i  G  [l,n  —  k],  we  must  have 
p  <  k.  Additionally,  if  p  =  k  then  we  have  y(  1  :  k)  =  y(  1  :  p)  =  x(l  :  k), 
which  is  a  contradiction  since  yk  ^  Xk-  This  implies  that  we  must  have 
p  <  k.  Hence  we  must  have  the  following  string  of  equalities: 


0"-fc  ©  x(l  :  k) 


u 

w®y{  1  :  p) 
w  ©  x(l  :  p). 


Thus  we  have  x(k  —  p+ 1  :  k)  =  x(l  :  p),  or  Xi  =  Xk-P+i  for  i  =  1,  2, . . .  ,p. 
Additionally  we  note  the  following  equalities  hold: 


2  (k-p)  = 

< 
< 
< 


2k  —  2 p 
2k  —  2  (n  —  t) 
k  +  2t  —  n 
k. 


Hence  since  Xi  =  Xk-P+i  for  i  G  [l,p]  and  2 (k  —  p)  <  k,  we  know  that 
x(l  :  k)  has  period  ( k  —  p).  In  fact,  since  we  chose  p  to  be  maximum, 
x(l  :  k)  is  (k  —  p)-periodic. 

Next,  we  show  that  u(t  +  1  —  (k  —  p)  :  n)  is  also  (k  —  p)-periodic.  First, 
we  note  the  following  inequalities: 


n  —  (t  +  1  —  (k  —  p))  +  1  =  n  —  t+k—p 

>  2  (k  —  p). 


The  last  line  comes  from  the  facts:  n  —  t  >  t  >  n  —  p  >  k  —  p.  Hence  our 
string  length  is  at  least  2(k  —  p),  and  from  our  previous  paragraphs,  so 
long  as  u(t  +  1  —  (k  —  p)  :  n)  is  contained  in  u(n  —  k  +  1  :  n)  =  x(l  :  k), 
we  know  that  it  must  have  period  (k  —  p).  For  this  we  note  that 

t  +  1  —  (k  —  p)  >  (n  —  p)  +  1  —  {k  —  p)  =  n  —  k  +  1, 

and  so  u(t  +  1  —  (k  —  p)  :  n)  indeed  has  period  (k  —  p),  and  thus  u  S, 
except  if  (n  —  1)  —  (f  +  1  —  (k  —  p))  +  1  <  2{k  —  p).  In  this  case  we  must 
have  k  —  p  =  t,  k  =  n  =  2t,  and  p  =  t. 
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If  (i  +  1  —  £  :  n  —  1)  is  Aperiodic  for  some  £  <  k  —  p  =  t  and  m,  then 
by  Lemma  3.31  we  must  have  that  u^t,rn^{t  + 1  — £  :  n—  1)®  (un+m)  is  also 
Aperiodic,  and  hence  u  £  S.  On  the  other  hand,  if  itlt,m)  (t  +  l—l :  n  —  1) 
is  not  Aperiodic  for  any  £  <  t  and  m,  then  we  note  that  un  =  u2i  =  ut  •  so 
again  u  ^  S.  Therefore  in  all  cases  we  have  it  ^  S',  so  we  must  have  v  £  S. 

All  that  remains  is  to  show  that  v  B{~(x).  We  note  that  if  v  £  B{~{x), 
then  y(  1  :  k)  is  Aperiodic  for  some  £  <  k  +  t—n  (using  the  same  argument 
as  we  used  to  show  that  x(l  :  k)  was  (k  —  p)-periodic).  Since  x(l  :  k)  = 
(j/(l  :  k))^k,m^  for  some  m,  by  Lemma  3.29  it  is  not  possible  that  both 
x(l  :  k)  and  y(  1  :  k)  are  periodic.  Hence  we  must  have  v  qL  B^~(x),  and  so 
we  may  use  v  to  separate  x  and  y. 


□ 

When  we  combine  the  previous  theorem  with  the  following  result,  we  have 
a  complete  set  of  constructions  for  optimal  Aidentifying  codes  in  B(d,n )  with 
t  >  2  and  n  >  2 1. 

Theorem  4.56.  Let  n  >  3  and  S  =  \  {xiax^x^ . . .  xn-\a  \  a  £  Ad}-  If 

n  is  even ,  then  S  is  a  2-identifying  code  for  B(d,n).  If  n  is  odd,  then  S'  = 
(S  U  {(ab)!irb  |  a  yl  b  £  A2})  \  {(ab)rt^ a  |  a  ^  b  £  A2}  is  a  2-identifying 
code  for  B(d,n).  In  both  these  cases,  the  2-identifying  code  is  of  optimal  size 
dn~1{d  —  1). 

Proof.  Consider  an  arbitrary  string  x  =  X1X2X3  . . .  xn  £  A 2,  and  define  the  set 
T  =  IDs  (a;).  We’ll  consider  the  contents  of  T  in  four  cases  based  on  the  equality 
of  x\,xn-i  and  of  X2 ,  xn.  First  let  C  =  {ax  |  o  £  A  \  {xn-2}}- 

Case  1.  If  x2  =  xn  and  x,\  =  xn_i,  then  T  =  A  ©  C.  Thus  |T|  =  d2  —  d. 

Case  2.  If  X2  7^  xn  and  X\  =  xn-i,  then  T  =  [A  ©  C)  U  {a;}.  If  x  £  A  ©  C 
then  x+  =  ax  for  some  a  £  A  \  {xn-2}-  In  this  case, we  have  X2X3  ■  ■  ■  xn  = 
ax  1X2  ■  ■  ■  xn-2-  This  implies  that  we  have  x\  =  X3  =  x$  =  •  •  • ,  and  also  that 
X2  =  X4  =  Xq  =  ■  ■  ■ .  Since  this  case  requires  that  a,’i  =  we  must  have 

that  either  n  is  even  or  that  Xi  —  X2  =  X3  =  •  •  ■  =  xn.  In  either  case,  this 
contradicts  our  assumption  that  12  /  i„.  Thus  x  A  ®  C,  and  we  conclude 
that  |T|  =  d2  —  d+  1. 

Case  3.  If  X2  =  xn  and  X\  7^  xn-\,  then  T  =  ^4® {CU{i-}}.  If  x~  =  ax 
for  some  a  £  ^4\  {a;n_2},  then  ax  1X2  ■  ■  ■  xn_2  =  ^1^2  ■  •  ■  £n-i-  This  implies  that 
we  have  X\  =  X2  =  X3  =  ■  ■  ■  =  xn-2  =  xn-\.  This  contradicts  our  assumption 
that  X\  7^  xn-\.  Thus  x~  7^  ax  for  any  a  £  A  \  {xn-2},  and  we  conclude 
that  |T|  =  d2. 

Case  4.  If  X2  7^  xn  and  X\  7^  xn-\,  then  T  =  (A®  {C  U  {.t-}})  U  {a-}.  As 
in  Case  3,  since  X\  /  xn_  1 ,  A  ®  { C  U  {x-}}  contains  d2  distinct  elements.  Let 
us  consider  whether  x  £  A  ©  {C  U  {x-}}.  If  not,  then  \T\  =  d2  +  1.  There  are 
two  cases  to  consider. 
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a.  If  x  £  A  ®  C,  then  x+  =  x~ .  In  this  case,  we  must  have  that 
X2X3X4  ■  ■  ■  xn  =  X4X2  ■  •  •  xn-i,  which  implies  that  we  have  the  following  chain  of 
equalities:  x\  =  X2  =  £3  =  •  •  •  =  xn-\  =  xn.  This  contradicts  the  assumptions 
that  X2  7^  xn  and  X\  7^  xn_i-  Thus,  this  case  does  not  occur. 

b.  If  x  £  A  ®  {a?~},  then  x+  =  ax  for  some  a  £  A.  Then  X2X3  ■  ■  ■  xn  = 
ax  1X2  ■  ■  ■  xn-2-  This  implies  that  x\  =  X3  =  X5  =  ■  •  • ,  and  also  that  X2  = 
X4  =  Xe  =  ■  ■  ■  ■  If  n  is  even,  this  contradicts  our  assumptions  that  X2  7^  xn  and 
x\  7^  Xn- 1-  Thus  for  even  n,  this  case  does  not  occur.  For  n  odd,  this  case  only 
occurs  if  x  £  {(ab)^~ a}. 

Thus,  if  n  is  even,  or  n  is  odd  and  x  {(a6)^~ o},  we  can  see  that  T  = 
IDs  (a;)  completely  determines  the  string  x.  In  particular,  given  T  we  can  decide 
which  case  we  are  in  based  on  |T|.  We  can  then  determine  X\, . . . ,  xn  based  on 
the  content  of  T.  Thus  in  these  cases  S  is  an  identifying  code. 

However,  if  n  is  odd,  and  x  £  {(ab)^~  a}  we  must  change  S  to  get  an 
identifying  code.  Note  that  ({ab)1^ a)  U  {(ab)1^ b}  =  ((ab)R^~ b) .  Since 

our  set  S  contains  vertices  of  the  form  (ab)^~ a  but  not  (ab)^~b,  these  two 
types  of  vertices  must  have  identical  identifying  sets  with  respect  to  S.  Thus  by 
adding  the  vertices  in  {(afr)T-  6},  we  are  able  to  create  distinct  identifying  sets 
with  respect  to  S'U {(ab)^~ b}.  However,  we  note  that  we  now  have  the  vertices 
of  and  {( ab )!L2~ a}  in  our  identifying  code,  but  that  B %  {{ab)^r  a )  U 

{ b(ba }  =  B2  (b(ba)^5~).  This  implies  that  the  inclusion  of  both  (ab)^t~b 
and  (ab)-*- a  in  our  identifying  code  is  only  necessary  if  they  are  required  to 
identify  vertex  (ab)~^~ a  from  vertex  b{ba)^^ .  So,  as  long  as  we  can  identify 
(ab)  “2“  a  differently  from  b(ba)  without  using  b(ba)  ,  we  need  only  include 
(ab)~ —  b  and  not  (ab)^~ a  in  our  identifying  code.  Since  these  two  vertices  have 
disjoint  in-balls  of  radius  2  for  n  >  3,  they  must  have  distinct  2-identifying  sets. 
Thus  S'  is  a  2-identifying  code  in  this  case.  □ 

Finally,  we  provide  additional  constructions  of  identifying  codes.  Theorems 
4.57  and  4.58  have  proofs  very  similar  to  that  of  Theorem  4.55,  so  we  omit  them 
here. 

Theorem  4.57.  Assume  that  d  >  2,  and  n  >  3.  Then  the  following  subset  S 
is  an  optimal  1-identifying  code  of  size  dn^1(d—  1)  in  B(d,n). 

S  =  <  x  £  A'f  |  for  some  m  and  £  £  {1,  2},  x^1’m'1  (1  :  n  —  1)  is  £-periodic  or 


almost  £-periodic,  but  cr1,m)(l  :  n  —  1)  ©  (xn  +  m)  is  not. 
U 


is  not  £-periodic  for  any  m  and  £  £  {1,  2}} 
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Theorem  4.58.  Assume  that  d  >  2.  Then  the  following  subset  S  is  a  t- 
identifying  code  of  size  dn~l(d—  1 )  +dt  in  the  directed  de  Bruijn  graph  B(d,n), 
if  n  =  2t  —  1  >  5. 

S  ={x  G  Ad  |  for  some  m  and  l  <  t  —  1;  x^t,m\t  +  1  —  l  :  n  —  1)  is  l-periodic, 
but  x^’m\t  +  1  —  l  :  n  —  1)  ®  ( xn  +  m)  is  not.} 

U 

ja;  G  A d  1 1(  and  x^t,m\t  +  1  —  l  :  n  —  1)  is  not  l-periodic 

for  any  m  and  l  <  t  —  1.} 

U 

{x  G  A}}  |  :  n  —  1)  is  almost  t-periodic,  for  some  m.} 

We  note  that  the  construction  in  Theorem  4.58  is  not  optimal.  To  find  an 
optimal  i-identifying  code  when  n  =  2t  —  1  is  an  open  problem  to  be  considered 
in  the  future.  For  the  cases  when  n  <  2t  —  1,  we  have  the  following  theorem. 

Theorem  4.59.  There  is  no  t-identifying  code  in  the  directed  de  Bruijn  graph 
B(d,  n)  when  n  <  2t  —  2. 

Proof.  Let  u  =  0n~t  ®  1  ©  0*~2  ©  1  and  v  =  0”_t  ©  1  ©  O*-2  ©  0.  Since  Bf(u) 
and  Bf  (y)  contain  all  vertices  that  end  with  0n-t  or  0"_t  ©  1  ©  0fc  where 
k  —  0,1,...,  77.  —  t  —  1,  u  and  v  are  t-twins.  Thus  (d,  n)  has  no  t-identifying 
code.  □ 

As  an  additional  treat  for  the  reader,  we  provide  a  simple  construction  for 
1-identifying  codes  in  B{d,n)  whenever  we  have  either  d  >  2  or  n  odd. 

Theorem  4.60.  If  n  is  odd,  or  n  is  even  and  d  >  2,  then 

s  =  \  ©  a  |  a  G  Ad} 

is  an  identifying  code  for  B(d,n).  Further  this  identifying  code  has  optimal  size 

(d-  ljd""1. 

Proof.  Define  S  as  in  the  statement  of  the  theorem.  First,  we  will  see  that  the 
identifying  set  for  every  vertex  has  size  either  d  or  d  —  1.  Let  x  =  X\X2  ■  ■  ■  xn, 
then 

N~(x )  n  S  =  {A  ©  X1X2  ■  ■  ■  xn-i}  \  {xn-iX\X2  ■  ■  ■  xn-i}. 

If  X\  =  x„,  then  IDs(a;)  =  N~(x)  n  S  has  size  d—1.  Whereas,  if  X\  ^  xn, 
then  IDs(x)  =  {a;}  U  N~(x)  fl  S  has  size  d. 

From  this  it  is  clear  that  every  vertex  has  a  non-empty  identifying  set.  How¬ 
ever  we  must  also  show  that  every  identifying  set  is  unique.  Suppose  there  are 
two  distinct  vertices  x,y  G  V(B(d,n))  such  that  IDs(.t)  =  IDs(j/).  Call  their 
identical  identifying  set  T.  We  look  at  the  two  cases,  |T|  =  d  and  \T\  =  d  —  1, 
separately  below. 
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Suppose  that  \T\  =  d.  Then  {x,y}  C  T  by  our  assumption  on  T  and  our 
earlier  reasoning.  Since  x  ^  y,  this  means  that  B(d,n )  contains  both  directed 
arcs  x  — >  y  and  y  — >  x.  This  allows  us  to  conclude  that  {x,y}  =  {( ab)k ,  ( ba)k } 
for  some  distinct  a,b  £  A  with  k  =  n/2.  In  particular  we  must  have  n  even. 
Below  are  the  precise  identifying  sets  for  x  and  y. 

IDs((a6)fe)  =  {(ab)k ,(ba)k}  U  {c(ab)k~1a  \  c  £  A\{a,b}} 

ID  s((ba)k)  =  {(ab)k,(ba)k}U{c(ba)k~1b\c£A\{a,b}} 

If  d  >  2  these  two  identifying  sets  are  in  fact  different,  which  is  a  contradic¬ 
tion. 

Suppose  that  |T|  =  d—  1.  Then  neither  x  nor  y  is  in  T,  which  means  neither  is 
in  S.  However  since  their  identifying  sets  are  identical,  this  means  that  they  have 
identical  first  neighborhoods.  By  definition  of  first  neighborhoods,  this  means 
that  x  and  y  have  the  same  prefix  but  different  final  letters.  By  then  definition 
of  S,  one  of  x,  y  (if  not  both)  is  a  member  of  S,  which  is  a  contradiction.  □ 

4.3.2  Undirected  de  Bruijn  Graphs 

General  Results 

We  now  consider  the  general  undirected  de  Bruijn  graph.  Our  first  result  proves 
the  existence  of  t-identifying  codes  in  B(d,n)  for  d  >  3  and  relatively  large  n 
(with  respect  to  t). 

Theorem  4.61.  B(d,n)  is  t-identifiable  for  d  >  3  and  n  >  2 1. 

To  prove  this  theorem,  we  will  first  prove  the  following  lemma,  which  relies 
heavily  on  Lemma  3.34. 

Lemma  4.62.  For  n  >  2 1,  the  number  of  distinct  t-prefixes  in  Bt(y)  \  [d]4  © 
2/12/2  •  ■  ■  yn-t  ™  at  most 

t- 1 

l-rfL‘/2j  +2.J2dj. 

j= o 

Proof.  Following  Lemma  3.34,  the  t-prefixes  in  Bt(y)  take  one  of  the  following 
three  forms  (matching  the  types  in  Lemma  3.34). 

1-  2/12/2  -  -  -  2/tl 

2.  [d]g  ©  2/b— /+ 1  •  •  •  yt+b—f—g ; 

3.  [d]t~c  ©  2/b+l  •  ■  ■  yt+b+c-f- 

In  order  to  more  easily  count  these  t-prefixes,  we  will  sort  them  by  the  last 
letter  that  appears,  and  then  sort  them  from  longest  [d]®  prefix  to  smallest. 
Since  the  largest  [d]1  prefix  also  counts  the  strings  with  smaller  [dp  prefix  so 
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long  as  the  strings  end  in  the  same  letter,  this  will  allow  us  to  count  unique 
prefixes.  We  begin  by  rewriting  the  types  of  prefixes  so  as  to  more  easily  do 
this. 

1-  2/12/2  ■■■Hu 

2.  We  find  this  range  of  y-subsequences  by  noticing  the  following: 

min (b- f)  =  {g  +  1)  -  (t  -  2g  -  1) 

=  3g  +  2  —  t,  and 
max(&—  /)  =  max(g  +  1,  t  —  g) 

=  t-  g. 

Hence  for  0  <  g  <  t-^: 

[d]9  ®  2/3ff+2— t+l  ■  ■  •  U2g+2 

[d]9  ©  yt  -ff+1  •  •  •  yit-lg 

Last  letters:  yi  such  that  2g  +  2  <  i  <  2t  —  2 g. 

Range:  yi  is  a  last  letter  whenever  t  +  1  <  *  <  2t. 

Max  g  for  each  i:  . 

3.  Note  that  in  this  case,  we  can  cover  all  cases  with  c  >  0  by  a  different  case 
with  c  =  0,  so  we  may  just  consider  the  cases  c  =  0  to  simplify  things. 

(a)  For  0  <  /  < 

W  ©  2/1  ■■■Vt-f 

[d]J  0  Vf  ■  ■  ■  2/t-i 
Last  letters:  yt- 1 , . . . ,  yt.-i- 

Range:  iji  is  a  last  letter  whenever  t  —  f  <i  <t  —  1. 

Max  f  for  each  i: 

(b)  For  <  /  <t  (recall  we  eliminated  f  =  t): 

[d\f  ©  2/i  ■■■Vt-f 

r 

[d]J  ©  2/t-/+i  •  •  •  V2t-2f 

Last  letters:  j/i ,  2/2 ,  -  •  •  ,2/t-2- 

Range:  yi  is  a  last  letter  whenever  t  —  f<i<2t  —  2  f. 

Max  f  for  each  i: 
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Note  that  because  we  require  n  >  2 1,  both  cases  (2)  and  (3)  cover  all  possible 
f-prefixes.  That  is,  we  cannot  possibly  have  any  f-prefixes  that  end  in  [d]k  for 
any  k  >  0.  Additionally,  note  that  each  case  covers  a  different  range  of  last 
letters:  (1)  i  =  t;  (2)  t  +  1  <  *  <  2 1;  and  (3)  t  —  /  <  i  <  t  —  1.  Hence  we  may 
count  each  case  separately. 


1.  There  is  only  one  string  in  this  case. 


2. 


We  showed  previously  that  max(<?) 
formula. 


+  —  l  _  t  —  3 

+  2  •  J2j= o 

2-E 


L^J.  Thus  we  have  the  following 

cP,  if  t  is  odd; 
if  t  is  even. 


3.  In  this  case,  our  subcases  (a)  and  (b)  overlap.  We  break  up  our  ranges 
slightly  differently  this  time  to  determine  ma x(/). 


(a)  1  <  i  <  t—^~. 

In  this  range  for  i,  we  must  be  in  the  higher  range  for  /,  so  we  have 
max(/)  =  L^pJ. 

(b)  ^  <  i  <  t  -  2. 

Considering  both  ranges  for  /,  we  have  the  following  maximum  value 
for  /,  depending  on  i. 


max(/) 


=  max 


f  t  +  1 

2 1  —  i 

v 

2  t  —  i 

V  2  ’ 

2 

)- 

2 

(c)  i  =  t  —  1. 

For  this  value  of  i,  we  must  be  in  the  lower  range  for  /,  and  hence 
we  have  max(/)  =  =  L^rJ  ■ 

Hence  all  cases  (a)-(c)  have  max(/)  =  [EpJ  •  Thus  we  have  the  following 
formula. 


J  2  •  Ej-t+i  dP ,  if  t  is  odd; 

|  —d^  +  2  •  Ej=r +i  d? ,  if  t  is  even. 

Now  when  we  combine  all  of  our  equations  we  get  the  following  final  count. 


t-i 

1-dUJ  +2  -J2dJ 

I=o 

Note  that  this  provides  only  an  upper  bound  on  our  f-prefixes  -  if  we  have 
repeated  letters  than  we  may  have  double-counted.  □ 

Now  we  are  ready  to  prove  our  theorem. 
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Proof  of  Theorem  4-61-  Consider  two  arbitrary  strings:  x  =  x\x2---xn  and 
U  =  ViV'2  ■  ■  -  Vn-  We  will  show  that  these  two  strings  cannot  be  t- twins  by 
showing  that  Bt(x)  \Bt(y)  ^  0.  This  will  be  done  in  two  cases:  X\X2  ■  ■  ■  xn-t  ^ 
ViV2  ■  ■  ■  Vn-t  and  xt+iXt+2  ...xn  /  yt+m+2  ■■■ Vn ■  Note  that  this  covers  all 
cases,  since  x  ^  y  implies  there  is  some  i  £  [1,  n]  such  that  x j  ^  yi-  Additionally, 
since  n  >  2 t,  we  must  have  that  i  £  [1,  n  —  t]  U  [t  +  1,  n].  Hence  at  least  one  of 
these  two  cases  must  be  true. 

1.  XiX2  ■  ■  ■  xn-t  ±  2/12/2  ■  •  •  Vn-t- 

We  will  show  that  there  must  exist  some  string  in  Bt{x)  that  is  not  in 
Bt(y).  In  particular,  there  is  a  string  a  £  [d]*  0  aq...2 :n-t  such  that 
a  ^  Bt(y).  We  do  this  by  counting  the  number  of  distinct  f-prefixes  in 
Bt(y)  \  [d]*  ©  yij/2  •  ■  ■  Vn-t,  and  showing  that  this  number  is  smaller  than 
d*.  Note  that  because  of  the  case  that  we  are  in,  we  need  not  consider  the 
strings  in  [d]4  ©  2/12/2  •  ■  ■  Vn-t  ■  If  we  can  show  that  the  number  of  t-prefixes 
is  smaller  than  d*,  then  there  must  be  some  string  z  £  Bt(x)  \  Bt(y). 

From  Lemma  4.62,  we  know  that  the  total  number  of  f-prefixes  in  Bt{y)  \ 
[d]*  ©  2/12/2  •  ■  ■  Vn-t  is  equal  to  1  —  J  +  2  •  Ylj^=o  dJ ,  and  that  one  of  those 
f-prefixes  is  2/1  -  -  •  2/t ,  which  we  may  ignore  because  of  the  case  that  we  are 
in.  Define  f(t)  =  — dfi-l  +  2  ■  0  ^  anc^  =  d*  —  f(t).  If  we  can  show 

that  g{t)  is  always  positive  for  d  >  3,  then  we  know  that  there  exists  a 
string  a  £  ([d]‘  ©  Xi . .  .xn-t)  \  ([d]4  0  2/1  ■  ■  ■ yn-t )  C  Bt{x)  \  Bt(y).  Then 
we  know  that  x  and  y  are  not  f-twins. 

Consider  our  new  function  g(t). 


t- 1 

g(t)  =  d‘ +  dt/2  -  2  • 

=  d*  +  d*/2  -  2  '  ~  1} 

a  —  1 

d‘(d  -  1)  +  d*/2(d  -  1)  -  2(d*  -  1) 
d-  1 

We  will  determine  the  nature  of  this  function  by  finding  the  roots.  We  find 
the  roots  by  setting  the  numerator  equal  to  0  and  making  a  substitution 
x  =  d*/2. 

d‘(d-l)  +  d*/2(d-l)-2(dt-l)  =  x2(d  —  3)  +  x(d  —  1)  +  2 

The  roots  of  this  equation  are  x  =  —  1  and  x  =  2d-6  •  Reversing  our 
substitution  this  equates  to  d*f2  =  —  1  and  df^2  =  •  The  first  root  is 

impossible,  and  the  second  will  only  be  possible  when  2d  — 6  <  0,  or  d  <  3. 
Hence,  if  d  >  3,  our  function  has  no  real  roots  and  is  always  positive. 

2.  Xt.-\-iXt+2  •  •  •  Xn  7^  2/t-|-l2/£+2  ■  ■  ■  2/n- 
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In  this  case,  we  want  to  show  that  there  exists  some  string: 

a  £  (xt+1  . . . x„  ©  [ d ]*)  \  (yt+i . . . ©  [d]4)  C  Rt(x)  \  Bt(y). 

Because  of  the  symmetric  nature  of  the  strings  and  edges  in  the  de  Bruijn 
graph,  this  case  follows  the  same  as  the  previous  case,  with  analogous 
lemmas  to  Lemmas  3.34  and  4.62  for  t-suffixes  (instead  of  f-prefixes).  Thus 
we  will  again  always  have  fewer  than  d*  prefixes  represented  in  Bt(y )  \ 
(yt+ 1  . . .  yn  ©  [d]*),  so  we  will  always  be  able  to  find  the  desired  string  a 
that  can  identify  x  from  y. 


□ 


Specific  Results 

Theorem  4.63.  For  n  >  3,  the  graph  25(2,  n)  is  identifiable. 

Proof.  For  n  =  3,  the  following  is  a  minimum  1-identifying  code  on  25(2,  3). 

{001,010,011,101} 

When  n  >  4,  we  have  the  following  proof,  with  many  cases.  We  will  prove 
this  result  by  showing  that  it  is  not  possible  to  have  two  vertices  x  and  y  that 
are  twins.  Suppose  (for  a  contradiction)  that  x  and  y  are  in  fact  twins  in  B{ 2,  n). 
First,  the  1-balls  for  each  vertex  are  as  follows. 


X\X2 

. . .  xn 

dld2 ■ • ■ Vn 

Oxi . 

•  %n— 1 

o 

3 

1 

lxi . 

•  %n— 1 

>  Bx(y)  =  < 

lyi-.-Vn-i 

X2  .  . 

Xn0 

2/2  ■  2/n0 

x2  .  . 

X„1 

2/2  *  -  -  Vn  1 

Without  loss  of  generality,  we  assume  that  x\  =  0.  Then  we  have  two  cases: 
either  xxx2  ...xn  =  0y1...  yn _i,  or  xxx2  . . .  xn  £  {y2  . . .  yn  0,  y2  . . .  yn  1}. 

1.  xix2  . .  .xn  =  Oyi . .  .yn-i. 

In  this  case,  we  know  that  0^2  •  ■  ■  xn  =  Oyi . . .  yn- i,  and  so  x2  . . .  xn  = 
y i  . . .  yn-i-  From  this,  we  know  the  following  equality  holds. 

{x2...  x„0,  x2...  xnl}  =  {yiy2  ...yn,  yiy2  . . .  yd} 

This  gives  us  two  cases:  either  yxy2  . .  .yd  £  {Oyi . . .  yn- 1,  lj/i  •  •  •  yn- 1},  or 
yiy2  ■■■yd£  {y2-  ■■yn0,y2  ■  ■■ yn !}• 

(a)  2/12/2  ■■■yd£  {Oyi . . . yn- i,  lj/i  •  •  •  yn- 1} 

The  fact  that  y2  . . .  yd  =  yi  ■  ■  ■  yn- 1  implies  the  following. 

2/1  =  2/2  =  -  •  •  =  J/n-l  =  yd 
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Because  we  are  in  Case  1  and  X2  ■  ■  ■  xn  =  yi . . .  yn- 1,  we  also  have 
the  following  equalities. 

x2  =  x3  =  ■  ■  ■  =  xn  =  yi 

Hence  our  1-balls  must  be  as  shown  below  for  some  a  G  {0, 1}. 


'  0a. 

.  a 

a . . 

.  aa 

00a 

. .  a 

0a . 

. .  a 

10a 

. .  a 

►  Bi(y)  =  < 

la 

. .  a 

a . . 

aO 

a . 

.  aa 0 

a . . 

al 

/ 

a . . 

.  aal 

Note  that  since  n  >  4,  we  have  two  strings  in  B\  (y)  that  have  different 
second-to-last  and  third- to- last  letters,  however  in  B\(x)  there  are  no 
such  strings.  Hence  these  sets  cannot  possibly  be  equal,  which  is  a 
contradiction. 

(b)  2/12/2  ■  •  •  Un  G  {y-2  ■  ■  •  2/n0,  2/2  -  -  -  2/«l} 

This  implies  that  2/12/2  •  •  •  Vn-i  =  2/2  ■  •  •  Vm  and  so  we  have  the  follow¬ 
ing  chain  of  equalities. 


2/i  =  2/2  =  •  •  •  =  yn- 1  =  Vn 

Hence  y  =  an  and  x  =  0 a"-1  for  some  a  G  {0, 1}.  Since  x  ^  y,  we 
must  have  a  =  1  and  thus  our  1-balls,  given  below,  are  clearly  not 
equal  -  a  contradiction. 


Bi  (x) 


'  01... 1  ' 
001 ...  1 
<  101... 1  > 
1...10 
1 ...  11 


Bi(y) 


xix2  ■  ■  ■  xn  G  {2/2  •  •  •  yn0, 2/2  •  •  •  2/n  1}  and  y2  =  0. 
From  this,  we  have  the  following  1-balls. 


'  0x2  . 

.xn 

1/1 0X2  •  •  ■  Xn^i 

00x2 

• • %n— 1 

0yi0x2 . .  .xn-2 

10X2 

• • %n— 1 

►  Bi(y)  =  < 

I1/1OX2  .  .  .  X„_2 

x2 . . 

x„0 

0X2  •  • ■ Xn-i0 

X2  ■  ■ 

X„1 

0x2  •  --X,,  1 1 

Now  we  have  two  cases:  either  (a)  ly\§x2  . . .  xn_2  =  IOX2  ■  ■  ■  xn-i,  or  (b) 
lyi0x2  ■  •  •  a;n_2  G  {x2  ■  ■  •  xn0,  x2...  xnl}. 


(a)  lyi0x2  ■  ■  ■  xn-2  =  10x2  ■  ■  ■  xn-i. 

This  statement  implies  that  we  have  the  following  chain  of  eciualities. 


2/3  =  •  •  •  =  Vn  =  X2  =  ■  ■  ■  =  xn _i 
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In  particular,  we  now  know  that  x  =  Oa  ...  a  and  y  =  00 a  ...  a.  Hence 
our  1-balls  are  given  below. 


'  0a. 

N 

.  a 

'  00a. 

.  a 

00a 

. .  a 

000a 

. .  a 

10a 

. .  a 

►  Bi{y)  =  < 

100a 

.  .a 

a . . 

aO 

0a . . 

aO 

a . . 

al 

0a . . 

al 

Since  000a . . .  a  €  B1(y),  the  only  way  to  have  B ±(x)  =  B1(y)  would 
require  a  =  0,  and  thus  x  =  y.  which  is  a  contradiction. 

(b)  lyi0x2  ■  ■  ■  xn-2  €  {x2  ■  ■  ■  xn0,  x2  . . .  xnl}  and  x2  =  1. 

In  this  instance,  we  know  that  x2  . . .  xn  =  \y\Qx2  . . .  xn-3,  and  hence 
X5  . . .  xn  =  x2  . . .  xn-3 .  This  tells  us  that  x  =  Olz/iOlj/i  . . .  and  y  = 
2/i01j/i01 ....  In  particular,  our  1-balls  are  now  shown  below. 


'  OlyiOlyi...  ' 

'  yiOlyiOl...  ' 

OOlyiOlj/i . . . 

Oj/iOlyiOl . . . 

lOlyiOlj/i . . . 

►  Bi(y)  =  < 

lj/iOlyiOl... 

Ij/iOlj/i  ■  •  -  0 

01j/i01 ...  0 

lyiOlyi ...  1 

.  01yi01...1 

Note  that  B\{y)  contains  two  distinct  strings  beginning  with  01,  while 
B\{x)  contains  only  one  such  string.  Hence  it  is  not  possible  that 
B i(x)  =  Bi(y),  which  contradicts  our  initial  assumption. 


□ 

The  binary  undirected  de  Bruijn  graph  £>( 2,  n)  turns  out  to  be  more  difficult 
to  establish  a  set  pattern  for  f-identifiability.  Note  that  we  have  £>( 2, 10)  is  not 
8-identifiable  as  we  would  think.  We  have  two  pairs  of  8-twins: 

{0111011110,0111101110} 

and 

{1000010001,1000100001}. 

Note  that  in  the  majority  of  cases,  we  find  the  maximum  t  such  that  £>(2,n)  is 
f-identifiable  is  t  =  n  —  2,  however  there  are  a  few  cases  in  which  this  does  not 
hold. 

For  d  >  3,  we  have  a  few  remaining  specific  results.  Some  of  these  results 
are  also  covered  by  the  more  general  results  in  the  previous  section. 

Lemma  4.64.  For  n  >  3  and  d  >  3,  B(d,n )  is  2-identifiable. 
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Pi'oof.  We  will  show  that  for  arbitrary  x\ . . .  xn  and  y\  . . .  yn  we  have  B2{x)  ^ 
B2(y).  Recall  that  we  have  the  following  contents  in  B2{y). 


B2{yi...yn) 


yi  •  ■  ■  Vn 

[d]  0  2/1  •  •  •  y„- 1 
2/2  ■  •  •  yn  ©  [d] 

<  [d]2  ®  2/1  •  ■  .yn~2  > 

2/3  •  •  •  Vn  ©  [d]2 
[d]  0  2/2  •  ■  •  Vn 

,  2/1  •  ■  -2/n-i  0  [d]  y 


We  have  two  cases. 


1.  X\  .  ..Xn-2  ±  2/1  •  •  -2/n-2- 

We  will  show  that  there  exists  some  a± . . .  an  £  [d]2  0  x\ . . .  xn-2  C  B2{x) 
that  is  not  in  B2(y).  The  following  table  shows  the  options  of  how  ai ... an 
could  lie  inside  of  B2(y),  and  the  corresponding  choices  for  aia2. 


«1  • 

■  on 

= 

2/1  • 

•  -  2 In 

2/12/2 

ai . 

■  rr n 

G 

[d] 

B  2/i 

2/n-l 

[d]  0  2/i 

ai . 

■  an 

G 

2/2  • 

•  -Vn 

0 

[d] 

2/22/3 

ai . 

■  an 

G 

[d]2 

0  2/i 

■Vn-2 

N/A 

ai . 

■  an 

G 

2/3  • 

■  -  2/n 

0 

[d? 

2/32/4 

ai . 

■  an 

G 

[d] 

B  2/2 

2 In 

[d]  0  2/2 

ai . 

.  an 

G 

2/1  • 

■  -  2 In 

-l 

©  [d] 

2/12/2 

Note  that  the  fourth  row  in  this  table  is  marked  “N/A”  because  of  the  case 
that  we  are  in.  Additionally,  note  that  at  most  this  list  accounts  for  2d +  2 
of  the  d2  possibilities  for  0102-  Hence  whenever  d  >  3  there  is  always  a 
choice  for  aia2  that  does  not  lie  on  this  list.  By  choosing  that  option,  we 
have  found  a\ . . .  an  €  B2(x)  \  B2{y). 

2.  X3...Xn  2/3  ■••2/n- 

We  will  show  that  there  exists  some  a\...an  €  x3  ■  ■  ■  xn  0  [d]2  C  B2{x) 
that  is  not  in  B2(y).  The  following  table  shows  the  options  of  how  a\ . . .  an 
could  lie  inside  of  B2(y),  and  the  corresponding  choices  for  an_i an. 


ai . 

.  an 

= 

2/i  • 

••2/n 

2/n 

-lVn 

ai . 

■  an 

G 

[d] 

B  2/i 

2/n-l 

2/n 

-22/n-l 

ai . 

■  an 

G 

2/2  • 

•2/n 

0 

[d] 

2/n 

0  [d] 

ai . 

■  an 

G 

[d]2 

0  2/i 

•  2/n— 2 

2/n 

—  32/n— 2 

ai . 

■  an 

G 

2/3  • 

•2/n 

0 

[d]2 

N/A 

ai . 

.  an 

G 

[d] 

B  2/2 

2/n 

2/n 

-l2/n 

ai . 

■  an 

G 

2/i  • 

•2/n 

-1 

0  [d] 

2/n 

-i  ©  [d] 

Note  that  the  fifth  row  in  this  table  is  marked  “N/A’’  because  of  the  case 
that  we  are  in.  Additionally,  note  that  at  most  this  list  accounts  for  2d +  2 
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of  the  d 2  possibilities  for  an-\an.  Hence  whenever  d  >  3  there  is  always  a 
choice  for  an-\an  that  does  not  lie  on  this  list.  By  choosing  that  option, 
we  have  found  ai . .  ,an  G  B2(x)  \  B2{y). 


□ 


Lemma  4.65.  For  d  >  3,  the  graph  B(d,  3)  is  2-identifiable. 

Proof.  We  will  show  that  for  arbitrary  x\x2X3  and  2/12/22/3  we  have  B2{x)  7^ 
B2(y).  We  have  three  cases. 

1.  If  X\  7^  j/i,  then  consider  010203  G  [d]2  ®  X\  C  B2(x).  We  will  choose 
0102  so  as  to  avoid  all  contents  of  the  following  set  of  size  at  most  3d  — 1: 
{[d]  ©  2/1,  [d]  ©2/2,2/3  0  [d],  2/12/2, 2/22/3}-  Since  there  are  a  total  of  d2  options 
for  0102,  as  long  as  d  >  3  there  is  still  a  choice  for  0102  that  lies  outside 
of  the  given  set.  Once  we  have  selected  such  a  pair  0102,  then  we  have  a 
string  010203  G  B2(x)  \  B2(y). 

2.  If  X3  7^  2/3,  then  consider  010203  £13®  [d]2.  We  will  choose  0203  so  as 
to  avoid  all  contents  of  the  following  set  of  size  at  most  3d  —  1:  { [d]  © 
2/i,  2/2  ©  [d] ,  2/3  ©  [d],  2/12/2, 2/22/3}-  Since  there  are  a  total  of  d2  options  for 
0203,  as  long  as  d  >  3  there  is  still  a  choice  for  0203  that  lies  outside  of  the 
given  set.  Once  we  have  selected  such  a  pair  0203,  then  we  have  a  string 
01  o2a3  G  B2(x)  \  B2(y). 

3.  Lastly,  if  x\  =  2/1,  £3  =  2/3,  but  x2  7^  2/2,  then  we  have  several  cases. 

(a)  If  x\x2  =  2/22/3,  then  we  must  have  x  =  abb  and  y  =  aab  for  some 
a  7^  b  G  [d].  Then  ebb  G  B2{x)  \  B2(y)  for  any  c^a,b. 

(b)  If  X\  =  £3,  then  x  =  aba  and  y  =  aca  for  some  b  7^  c  G  [d].  At 
least  one  of  b,  c  7^  a,  and  so  choose  some  k  G  {&,  c}  \  {a}.  Then 
kak  G  B2(x)AB2(y),  and  so  B2( x)  7^  H2(2/). 

(c)  If  X\X2  7^  2/22/3  and  £1  7^  £3,  then  we  consider  0,1020,3  G  Xia,’2  ©  [d] 
with  o3  7^  2/1, 2/2-  Then  010203  G  -B2(a;)  \  B2{y). 

Hence  in  all  cases,  B2{x)  7^  B2(y)  for  arbitrary  x,  y,  so  it  is  not  possible  to  have 
a  pair  of  2-twins  {x,y}  in  B{d,  3).  Thus  B(d,  3)  is  2-identifiable.  □ 

Note  that  Lemmas  4.64  and  4.65  combined  tell  us  that  all  graphs  B(d,  n ) 
with  d,  n  >  3  are  2-identifiable. 


Bounds 


Theorem  4.66  ([14]).  The  size  of  an  identifying  code  for  a  regular  graph  with 
N  vertices  and  vertex  degree  D  is  lower-bounded  by 


M(t)  > 


2N 
D  +  2 
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Proof.  Consider  the  K  x  N  binary  matrix  A  where  akn  =  1  if  and  only  if  the 
fcth  codeword  covers  the  nth  vertex,  and  akn  =  0  otherwise.  There  must  be 
K{D  +  1)  nonzero  entries  in  the  matrix,  as  each  codeword  covers  D  +  l  vertices. 
On  the  other  hand,  at  most  K  columns  can  have  weight  1,  while  the  remaining 
N  —  K  columns  must  have  weight  at  least  2,  the  number  of  nonzero  entries  must 
be  at  least  K  +  2 (N  -  K)  =  2 N  -  K.  Thus  K(D  +  1)  >  2 N  -  K,  or 


M(t)  =K> 


2N 
D  +  2 


□ 


Corollary  4.67.  The  size  of  an  identifying  code  for  a  graph  with  N  vertices 
and  maximum  degree  D  is  lower-bounded  by 


M(t)  > 


2  N 
D  +  2 


Proof.  Using  the  previous  argument,  the  maximum  number  of  nonzero  entries 
in  the  matrix  is  K(D  +  1).  The  rest  of  the  argument  remains  unchanged.  □ 


Note  that  for  the  undirected  de  Bruijn  graph,  the  multiple  edges  and  loops  do 
not  change  the  identifying  codes  from  the  underlying  simple,  undirected  graph. 
Thus  we  can  apply  Corollary  4.67  easily  by  simply  removing  multiple  edges  and 
loops  and  considering  a  graph  with  maximum  degree  2d. 


Corollary  4.68.  The  size  of  an  identifying  code  for  the  undirected  binary  de 
Bruijn  graph  B(2,n )  is  lower-bounded  by 


ora+l  i 

M{t)>  —  =  -.|B(2,n)|. 

Corollary  4.69.  The  size  of  an  identifying  code  for  the  undirected  de  Bruijn 
graph  B{d1n)  is  lower-bounded  by 


M(t)  > 


dn 

d  +  l 


In  our  small  examples  of  B{ 2,3)  and  £1(2,4),  this  bound  agrees.  For  exam¬ 
ple,  a  minimum  1-identifying  code  for  £1(2,  3)  is  {001,010,011,101}  with  size 
4.  Corollary  4.68  gives  us  a  lower  bound  of  |  =  2.66.  Also,  a  minimum  1- 
identifying  code  for  £1(2,4)  is  {0001,0010,0101,0111,1011,1100}  with  size  6, 
while  Corollary  4.68  gives  us  a  lower  bound  of  4^  =  5.33,  a  tight  bound  in  this 
case. 

An  alternative  lower  bound  is  given  by  the  following  theorem  that  is  based 
on  the  fact  that  the  de  Bruijn  graph  is  a  line  graph.  For  more  on  this,  see 
Chapter  4.5. 


Theorem  4.70  ([11]).  Let  G  be  a  twin-free  line  graph  on  n>  4  vertices.  Then 
we  have 


7 ID(G)  > 


3V2 

~ 


\fn. 
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For  the  de  Bruijn  graph,  this  implies  a  lower  bound  of  \[dP.  However, 
the  bound  from  Corollary  4.69  gives  us  a  better  bound  for  this  class  of  graphs. 


Non-Optimal  Constructions 

The  following  result  is  inspired  by  an  identical  result  for  the  n-dimensional 
binary  cube  in  [14]. 

Theorem  4.71.  If  C*  is  an  optimal  2-identifying  code  for  the  undirected  de 
Bruijn  graph  B(d,n),  then 

C  =  {w  |  £  C*  s.t.  d(v,w )  =  1} 

is  a  1-identifying  code. 

Proof.  We  will  show  that  every  vertex  in  the  undirected  de  Bruijn  graph  B{d,  n ) 
is  covered  by  a  unique  set  of  codewords.  Let  x  =  XiX2  •  •  -xn  £  B(d,n).  Note 
that  the  vertices  at  distance  1  from  x  are: 


U0 

=  X2X3 . 

.  -xn0 

Vo 

=  0X!X2 ■  ■  .Xn-1 

U! 

=  X2X3  . 

.  .  Xn  1 

V r 

=  lX]_X2  .2J„_1 

U2 

=  2:22:3  ■ 

. .  xn2, 

V2 

=  2aq2:2  . .  .2:n_i 

Ud-i 

=  X2X3  . 

..xn(d-  1) 

£  .. 

1 

=  (d  —  l)xix2  ■  ■  ■  xn 

We  will  refer  to  vertices  of  type  [7*  as  undirected  out-neighbors  and  vertices  of 
type  Vi  as  undirected  in-neighbors  for  obvious  reasons.  We  have  three  cases. 

Case  1:  x  £  C*:  In  this  case,  x  is  covered  by  all  U-V,  for  i  £  {0, 1,2,...,  d—  1} 
for  the  code  C.  If  some  w  =  W\W2  ■  ■  ■  wn  is  also  covered  by  all  d  Ufa  and 
all  d  Vj’s,  then  we  must  prove  that  w  =  x. 

Note  that  for  such  a  vertex  w,  we  can  only  have  that  w  is  an  undirected 
out-neighbor  of  at  most  one  Ui  and  is  an  undirected  in-neighbor  of  at  most 
one  Vj .  Thus  we  have  that  w  is  the  in- neighbor  of  at  least  one  Ui  and  the 
out-neighbor  of  at  least  one  Vj.  Then  we  know  some  information  about 
the  letters  of  w,  namely: 


w  =  xxx2  ■ ■ ■  z„_i wn 

=  WiX2  ■  ■ .Xn-\Xn. 

Hence  we  must  have  w\  =  X\  and  wn  =  xn.  or  in  other  words  w  =  x. 

Case  2:  x  £  C\C*:  In  this  case,  x  is  covered  by  itself  and  every  one  of  its 
neighbors  in  C.  The  only  other  vertices  covered  by  x  are  the  vertices 
Ui,Vi  for  i  £  {0, 1, 2, . . . ,  d  —  1}.  We  must  show  that  x  is  covered  by 
something  that  each  of  its  neighbors  is  not  covered  by  (or  vice  versa), 
thus  proving  that  they  have  different  cover  sets. 
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Let  T  £  {Ui  (  *  £  {0, 1, . . . ,  d  —  1}  U  {Vi  \  i  £  {0, 1, . . . ,  d  —  1}  be  arbitrary. 
Since  x  £  C,  there  must  be  some  y  £  C*  with  d(y,x )  =  1.  This  implies 
that  d(y,T)  <  2.  Then  in  the  code  C* ,  both  x  and  T  are  covered  by  y, 
since  C*  has  radius  t  =  2.  However  since  C*  is  also  an  identifying  code,  x 
and  T  must  have  different  identifying  sets,  so  there  is  some  z  £  C*  that 
either: 

1.  covers  x  and  not  T,  or 

2.  covers  T  and  not  x. 

In  Case  (2.1),  we  must  have  d(z,  x)  <2  and  d(z ,  T)  >  2,  which  implies  that 
d(z,x)  =  2.  Then  there  must  be  some  a  between  z  and  x,  i.e.  d(z,a)  =  1 
and  d(x,a)  =  1.  Then  a  £  C  and  also  covers  x  for  code  C,  but  a  can’t 
cover  T  for  C  as  otherwise  we  would  have  T  covered  by  z  for  C* .  See 
Figure  4. 


© - © 

© - © - © 


Figure  4:  Case  2.1:  Blue  nodes  in  C*,  red  nodes  in  C. 

In  Case  (2.2),  we  must  have  d(z,T)  <  2  and  d(z,  x)  >  2,  which  implies 
that  d(z,T)  =  2.  Then  there  is  some  a  between  2  and  T,  i.e.  d(z,a)  =  1 
and  d(a,T )  =  1.  This  implies  that  a  £  C  and  covers  T  for  C,  but  a  can’t 
cover  x  for  C  as  otherwise  we  would  have  x  covered  by  z  for  C*.  See 
Figure  5. 


© - © 


Figure  5:  Case  2.2:  Blue  nodes  in  C *,  red  nodes  in  C. 

Case  3:  i£b\(CU  C*):  In  this  case,  x  is  covered  by  some  y  £  C  and  some 
z  £  C*  with  d(z,  y)  =  1.  We  will  compare  x  with  another  vertex  w  £  V 
that  is  covered  by  y  for  C  and  show  that  they  have  different  identifying 
sets  for  C.  We  first  note  that  we  must  have  d(w,y)  <  d{x,y)  =  1  and 
d(w,z )  <  d(x,z)  =  2,  so  w  is  also  covered  by  z  for  C*.  Since  C*  is  an 
identifying  code,  there  must  be  some  v  £  C*  that  either: 
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1.  covers  x  and  not  w,  or 

2.  covers  w  and  not  x. 


See  Figure  6. 


© - © - © 


Figure  6:  Case  3:  Blue  nodes  in  C*,  red  nodes  in  C. 

In  Case  (1),  since  x  C,  we  have  d(v,x)  =  2  and  there  is  some  a  £  C 
between  v  and  x.  As  v  does  not  cover  w  for  C* ,  we  must  have  d(a,  w)  >  1, 
so  a  covers  x  but  not  w  for  C.  See  Figure  7. 


Figure  7:  Case  3.1:  Blue  nodes  in  C*,  red  nodes  in  C. 

For  Case  (2),  since  x  C,  we  have  d(v,x)  >  1.  Also,  as  v  covers  w  but 
not  x  for  C*,  we  must  have  1  <  d(w,  v)  <  2. 

If  d(w,v )  =  1,  then  w  ^  y,  as  otherwise  v  covers  x  for  C* .  This  implies 
that  d(w,x)  >  1,  else  v  would  cover  x  for  C* .  So  we  must  have  w  £  C, 
hence  w  covers  itself  but  not  x  for  C.  See  Figure  8. 


0 - 0 

© - © - © 


Figure  8:  Case  3.2.1:  Blue  nodes  in  C*,  red  nodes  in  C. 

Finally,  if  d(w,  v)  =  2,  then  there  must  be  some  a  G  C  with  d(a,  w)  =  1  = 
d(a,v).  This  implies  that  d(a,x )  >  1,  since  otherwise  we  would  have  v 
covering  x  for  C* .  Hence  a  covers  w  but  not  x  for  C.  See  Figure  9. 


□ 
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0 - 0 - 0 


© - © - © 


Figure  9:  Case  3.2.2:  Blue  nodes  in  C*,  red  nodes  in  C. 


Next  we  would  like  to  consider  constructions  using  direct  sums.  This  idea 
comes  from  constructions  developed  for  the  cube  graph  Qn  with  edges  between 
strings  with  Hamming  distance  one  (strings  differ  in  any  one  bit).  For  example, 
results  similar  to  the  following  two  theorems  would  be  nice. 

Theorem  4.72  ([5]).  Assume  that  C  is  1-identifying  on  Qn.  Then  the  direct 
sum  {0, 1}  ®  C  is  1-identifying  on  Qn+ 1  if  ond  only  if  d( c,  C  \  {c})  <  1  for  all 
c  C  ( '. 

Theorem  4.73  ([5]).  If  C  is  1-identifying  on  Qn  then  C  ®  {00,01, 10, 11}  is 
1-identifying  on  Qn+ 2- 

A  first  thought  might  be  to  consider  C*  =  {0, 1}  ®  C  ®  {0, 1},  where  C  is  a 
1-identifying  code  on  B(2,n  —  2).  However,  if  the  vertex  0"-2  €  V(B(2,n  —  2)) 
is  in  C  and  is  only  covered  by  itself,  then  we  have: 

=  {o,l,io”-1,o”-1i,io"-2i} 

Bi(  o”-1i)nc*  =  {o,l,io”-1,o"-1i,io"-2i} 

Note  that  Hi(10”-1)  flC*  is  missing  only  the  neighbor  110"-2,  as  if  this  vertex 
were  included  then  we  must  have  10"-3  £  C1  which  would  cover  0n_2.  Likewise, 
Bi(0ra_1l)  fl  C*  is  missing  0”_211  because  we  cannot  have  0”_31  €  C. 

An  initial  result  similar  to  these  is  the  following. 

Theorem  4.74.  Let  C  be  a  1-identifying  code  on  B{2,n  —  1)  such  that  for 
every  x  £  V(B(2,n  —  1))  we  have  at  least  one  undirected  in-neighbor  of  x  in  C' , 
and  at  least  one  undirected  out-neighbor  of  x  in  C .  Then  C  =  {0, 1}  ®  C'  is  a 
1-identifying  code  on  B(2,n). 

Proof.  We  must  show  that  C  satisfies  the  following  two  conditions:  (1)  every 
x  £  V(B(2,n))  has  a  neighbor  in  C,  and  (2)  for  every  x,y  £  V(B(2,n))  we  have 
IDc(x)  ^  ID c(y). 

1.  Let  x  =  X\X2  ■  ■  ■  xn  £  V (£>( 2,  n)).  Then  we  must  have  that  x'  =  X2X3  . . .  xn 
is  covered  by  some  undirected  out-neighbor  z'  =  . . .  xnzn  £  C' .  Then 

we  must  have 

{0a:3a;4  . . .  xnzn,  la:3a:4  . . .  x„znj  =  {0, 1}  ®  z'  C  C. 

In  other  words,  we  have  a;2a:3  . . .  xnzn  £  C,  which  is  also  a  neighbor  of  x, 
so  x  is  covered. 
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2.  Let  x  =  X\X2  ...xn,y  =  2/12/2  ■■■2/n  £  V(B(2,n))  be  distinct.  Define 
x'  =  X2X3  . . .  xn  and  y'  =  2/22/3  ■  •  ■  2 In-  We  have  two  cases. 

x7  =  y7:  In  this  case  we  must  have  2/1  =  5Tf.  Let  w'  £  C  be  an  undirected 
in-neighbor  to  x'  =  y' .  Then  we  must  have  W3  . . .  wn  =  X2  ■  ■  ■  xn-i, 
and  hence 

W2W3  .  .  .  Wn  £  {xix2  ■  ■  .Xn-3,XxX2  •  .  1}. 

Thus  we  must  have  that  0  ©  w'  and  1  ©  w'  are  elements  in  C  and  also 
undirected  in-neighbors  of  either  x  or  2/,  but  not  both.  Therefore 
w  £  IDc-(x)AIDc(2/),  so  IDcO)  ^  IDC(2/). 
x7  ^  y7:  Without  loss  of  generality  we  may  assume  that  there  is  some 
w'  £  I  Dev  (a;7)  \  ID  c'W)-  This  implies  that  both  w'  -fa  y'  and  y'  -fi¬ 
le' .  In  other  words,  we  know  that  both  W3  . . .  wn  ^  2/2  •  2/«-i  and 

W2  ■  ■  ■  wn- 1  yf  2/3  •  ■  ■  2M-  Hence  2/12/2  ■  ■  ■  2/n-i  ^  {0, 1}  ©  u;3  . . .  wn  and 
2/22/3  ■  •  •  2/n  ^  {0, 1}  ©  w2  ■  ■  ■  wn-i,  and  therefore  w  fi  y  and  y  fi  w. 
Thus  w  £  IDc(x)  \  I Dc (2/) ,  so  the  identifying  sets  are  distinct. 


□ 

While  the  requirement  that  all  codewords  have  both  in-  and  out-neighbors 
in  the  identifying  code  is  a  much  stronger  requirement  than  those  in  Theorem 
4.72,  it  does  still  provide  identifying  codes  of  small  cardinality.  For  example, 
the  minimum  size  identifying  code  in  the  graph  £>(2,3)  has  size  4,  however 
no  identifying  codes  of  size  4  are  extendable  under  this  operation.  Of  the  18 
identifying  codes  of  size  5,  8  are  extendable  to  B( 2,4),  and  two  of  the  8  satisfy 
the  conditions  of  Theorem  4.74.  These  two  graphs  are  shown  in  Figure  10. 


Figure  10:  Identifying  Codes  for  13(2,  3)  of  size  5  satisfying  our  conditions  4.74 

For  certain  cases,  building  identifying  codes  is  much  simpler.  We  have  the 
following  two  theorems  for  the  graphs  B(d,  2)  for  all  d.  Note  that  the  first 
theorem  allows  for  exactly  one  empty  identifying  set. 

Theorem  4.75.  The  set  S  =  {0z, zO  |  i  £  [d  —  2]}  is  an  identifying  code  for 
B(d,  2). 

Proof.  First,  for  ab  £  V(B(d,2)):  we  have  the  following  identifying  sets  ID(a6). 
1.  If  ab  =  00,  then  ID(a&)  =  S. 


41 


Approved  for  Public  Release;  Distribution  Unlimited. 


2.  If  ab  =  Ob  for  b  G  [d  —  2],  then  ID(a6)  =  {iO  |  i  £  [d  —  2]}  U  {Ob}. 

3.  If  ab  =  0  (d  —  1),  then  ID  (ab)  =  {*0  |  i  G  [d  —  2]}. 

4.  If  ab  =  aO  for  a  G  [d  —  2],  then  ID(a6)  =  {0*  |  *  G  [d  —  2]}  U  {a0}. 

5.  If  ab  =  (d—  1)0,  then  ID(a6)  =  {0*  |  i  G  [d  —  2]}. 

6.  If  ab  =  ab  for  a,  b  G  [d  —  2],  then  ID(a6)  =  {0a}  U  {60}. 

7.  If  ab  =  a(d  —  1)  for  a  G  [d  —  2],  then  ID(a6)  =  {0a}. 

8.  If  ab  =  (d  —  1)6  for  b  G  [d  —  2],  then  ID(a6)  =  {60}. 

9.  If  ab  =  (d—  1  ){d  —  1),  then  ID(a6)  =  {}. 

Next,  we  must  show  that  for  any  distinct  pair  ab,xy  €  V(B(d,  2)),  ID(a6)  ^ 
ID(a :y).  Clearly  if  ab  and  xy  are  of  different  types,  then  ID(a6)  ^  ID (xy).  Also, 
types  (1),  (3),  (5),  and  (9)  have  only  one  element.  That  only  leaves  us  with  the 
following  five  cases. 

2.  We  must  have  06  ^  0 y.  Then  since  06  G  ID(06)  \  ID(0y),  we  know  that 
ID(06)  ^  ID(0y). 

4.  We  must  have  a0  ^  xO.  Then  since  aO  G  ID(a0)  \  ID(x0),  we  know  that 
ID(a0)  ^  ID(x0). 

6.  We  must  have  either  a  ^  x,  and  so  0a  G  ID(«6)  \  ID(xr/),  or  6  y  and 
hence  60  G  ID(a6)  \  ID(xy).  In  either  case,  we  have  ID(a6)  ^  ID(xr/). 

7.  We  have  a(d  —  1)  x(d  —  1).  Then  since  0a  G  ID(a(d—  1))  \  ID(x(d  —  1)), 
we  know  that  ID(a(d  —  1))  ^  ID(x(d  —  1)). 

8.  We  have  (d  —  1)6  (d  —  l)y.  Then  since  60  G  ID((d— 1)6))  \  ID((d  —  l)y), 
we  know  that  ID((d  —  1)6)  ^  ID((d  —  1  )y). 


□ 

The  previous  theorem  gives  us  an  identifying  code  for  B(d,  2)  of  size  2(d  —  2). 
The  next  theorem  illustrates  an  identifying  code  for  B(d,  2)  of  size  [^rj,  which 
is  an  improvement  over  the  last  result  whenever  d  >  8. 

Theorem  4.76.  Define  the  following  sets. 

S  =  {12, 23, 34, . . .  ,(d  —  l)d,  dl} 

rp  _  \  {13, 35, 57, . . . ,  (d  —  2)d},  if  d  is  odd; 

{  {13, 35, 57, . . . ,  (d  —  1)1},  if  d  is  even. 

Then  SUT  is  an  identifying  code  for  B(d,  2). 
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Proof.  First,  we  show  that  every  vertex  in  B(d,  2)  is  covered  by  S.  For  any 
ab  £  B(d  —  2),  ab  is  adjacent  to  both  6(6  +  1)  and  (a  —  l)a. 

Next,  we  must  show  that  all  identifying  sets  are  unique.  Note  that  if  a  vertex 
ab  has  only  one  neighbor  in  S,  then  a  =  b  + 1  so  ab  =  (6  + 1)6.  Note  that  for  all 
6,  there  is  exactly  one  vertex  6(6+1)  with  |lV[(6  +  l)&]nSj  =  1,  so  the  identifying 
sets  are  unique. 

Otherwise,  we  assume  ab  ^  (6+1)6  and  xy  ^  (y+l)y  with  ab  ^  xy  and  show 
that  N[xy]C\T  ^  iV[a6]nT.  Note  that  if  iV[a6]nS'  ^  N[xy]C\S  then  we  are  done, 
so  we  assume  otherwise.  Then  we  have  {6(6+1),  (a—  l)a}  =  {y(y+ 1),  (x—  l)x}. 
This  gives  us  two  cases. 

1.  6(6  +  1)  =  y(y  +  1)  and  (a  —  l)a  =  (x  —  l)x. 

In  this  case  we  have  6  =  y  and  a  =  x,  or  ab  =  xy,  which  is  a  contradiction. 

2.  6(6  +  1)  =  (x  —  l)x  and  (a  —  l)a  =  y(y  +  1). 

In  this  case  we  have  6  =  x  —  1  and  a  =  y  +  1.  In  other  words,  we  have 
ab  =  (y  +  l)(x  —  1).  We  have  two  subcases. 

2.1  If  x  is  odd,  then  a;  —  1  is  even.  Then  T  D  N[xy ]  contains  (x  —  2)x. 
Note  that  N[(y  +  l)(x  —  1)]  D  T  contains  (x  —  2)x  only  if  x  =  y  +  1, 
which  is  a  contradiction.  Thus  we  must  have 

(x  -  2)x  £  ( N[xy )  D  T)  \  ( N[{y  +  l)(x  -  1)]  D  T. 

2.2  If  x  is  even,  then  x  —  1  is  odd.  Then  N[(y  +  l)(x  —  1)]  fl  T  contains 
(x  —  l)(x  +  1).  Note  that  N[xy]  fl  T  contains  (x  —  l)(x  +  1)  only  if 
y  =  x  —  1,  or  x  =  y  + 1,  which  is  a  contradiction.  Thus  we  must  have 

(x  -  l)(x  +  1)  €  (N[(y  +  l)(x  -  1)]  n  T)  \  (N[xy]  n  T). 


□ 


4.4  De  Bruijn  Functions 

4.4.1  Distance 

Many  parameters  that  we  are  interested  in  rely  on  distance  in  de  Bruijn  graphs. 
Fortunately,  formulas  for  distance  in  both  the  directed  and  undirected  graphs 
have  already  been  determined. 

Theorem  4.77  ([17]).  For  all  X,  Y  in  the  directed  graph  B(d,n), 

d(X,  Y)  =  n  —  max{s  |  1  <  s  <  n,  xn-s+iXn-s+2  ■■■xn  =  yiy2  ■  ■  ■  ys 

where,  by  convention,  the  maximum  over  an  empty  set  is  zero. 

The  formula  for  the  directed  graph  is  straightforward  and  simply  computes 
the  maximum  match  of  suffix  in  X  and  prefix  in  Y . 
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Theorem  4.78  ([17]).  For  all  X,Y  in  the  undirected  graph  B(d,n), 

d(X,  Y)  =  2n  —  1  +min{  min  (i  —  j  —  £itj(X,Y)),  min  (— i  +  j  —  Tij(X,  Y))} 

’  l<i,  j<n 

where 

l i,j(X ,  Y)  =  max{s  |  s  <  j,  s  <  n  —  i  +  1, 

‘D'D+1  ■  ■  •  -D-t-s  —  1  =  V j — s+lU j  —  s+2  ■  •  •  Vj 

Tij(X ,  Y)  =  max{s  |  s  <  i,  s  <  k  —  j  +  1, 

%i— s+l^i—  s+2  •  •  •  —  VjVj- 1-1  •  •  •  Vj-\-s—  1 

where,  by  convention,  the  maximum  over  an  empty  set  is  zero. 

The  following  example  illustrates  the  mechanics  of  this  formula  for  the  undi¬ 
rected  graph.  Let  X  =  010  and  Y  =  110  in  the  undirected  de  Bruijn  graph 
8(2,3).  Then,  using  Liu’s  algorithm,  we  must  compute  the  following. 

D(X,Y )  =  2n  —  1  +  min  (i  —  j  —  max{£ij(X,  Y),  rj  i(X,  T)}) 

=  5  +  min  (i  —  j  —  max{Lj  j(X,  Y),rj  i(X ,  Y)}) 

1<2,j<3 

Our  functions  and  rj:i  are  as  follows. 

(■ i,j  =  max{s\s<j,s<4-i,XiXi+1...xi+s-1=yj-s+1y:j-s+2---yj} 
rjti  =  max{s  ]  s  <  j,  s  <  4  -  i,  Xj-s+lXj-s+2  ...Xj  =  yiyi+i . . .  yi+s-i} 


We  find  the  following  values. 


i 

j 

kAX,Y) 

r.i,i(X,Y) 

max{  Oj,  Du} 

i  —  j  ~  ma x{£itj,rj:i} 

i 

1 

{}  =  0 

0=0 

0 

0 

i 

2 

{}  =  0 

{0  =  1 

1 

-2 

i 

3 

{1}  =  1 

O  =  o 

1 

-3 

2 

1 

{1}  =  1 

O  =  o 

1 

0 

2 

2 

{1}  =  1 

{0  =  1 

1 

-1 

2 

3 

{2}  =  2 

{2}  =  2 

2 

-3 

3 

1 

{}  =  0 

{0  =  1 

1 

1 

3 

2 

{}  =  0 

0=0 

0 

1 

3 

3 

{1}  =  1 

{0  =  1 

1 

-1 

The  minimum  in  the  right-most  column  is  —3,  and  so  we  find  D(X,Y)  = 
5  + (-3)  =  2. 

We  have  implemented  these  functions  in  the  following  manner  using  Matlab. 
We  propose  the  following  conjecture.  Define  the  term  distance  class  t  for 
vertex  X  as  the  set 

Dt(X)  =  {Y\d(X,Y)  =  t}. 

Conjecture  4.79.  The  set  V\ {L>o(0"),  Z?i(0"),  Dn(0n)}  is  an  identifying  code 
for  8(2,  n)  when  n>  4. 
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Algorithm  1  LHS(i,  j,  X,  Y):  Computing  £ij(X,Y) 

1:  procedure  LHS(i,  j,  X,  Y) 

2:  n=  length  (X) 

3:  sMax  =  0 

4:  S  =  0 

5:  while  (s  <  j)  and  (s  <  n  —  i  +  1)  do 

6:  if  XiXi+ 1 . . .  xi+s_i  =  yj^s+iUj-s+2  ■  ■  ■  Vj  then 

7:  sMax  =  s 

8:  end  if 

9:  S  =  S  +  1 

10:  end  while 

11:  end  procedure 
12:  return  sMax 


Algorithm  2  RHS(i,  j,  X,  Y):  Computing  r.ij(X,Y) 

1:  procedure  RHS(i,j,  X,  Y) 

2:  n  =  length(A) 

3:  sMax  =  0 

4:  S  =  0 

5:  while  (s  <  i)  and  (s  <  n  —  j  +  1)  do 

6:  if  Xi-s+1Xi-s+2  ...Xi  =  yjVj+1  ■  ■  ■  Vj+s-1  then 

7:  sMax  =  s 

8:  end  if 

9:  S  =  S  +  1 

10:  end  while 

11:  end  procedure 
12:  return  sMax 


Algorithm  3  D(X,  Y):  Computing  the  distance  between  X  and  Y 
1:  procedure  D(X,  Y) 

2:  n  =  length(X) 

3:  M  =  zeros  (n,  n) 

4:  for  i  from  1  to  n  do 

5:  for  j  from  1  to  n  do 

6:  =  i  —  j  —  max{LHS(i,  j,  X,  Y),  RHS(j,  i,  X,  Y)} 

7:  end  for 

8:  end  for 

9:  distance  =  2n  —  1  +  min  (min  (M),  [  ],  2) 

10:  end  procedure 
11:  return  distance 
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While  the  sets  Do(0")  and  Z?i(0")  are  easily  determined: 

D0( On)  =  {0”},  and 
Dx( 0n)  =  {10ra_1, 0"_11}, 

the  set  Dn(0n )  is  a  more  difficult  computation.  In  order  to  determine  this  set, 
we  must  determine  X  £  0(2,  n)  such  that  d(X,0n)  =  n.  As  a  start,  we  direct 
the  interested  reader  to  Lemma  3.35,  which  shows  that  in  the  nonbinary  case 
for  a  specific  node  there  is  always  at  least  one  other  node  at  distance  n. 

4.4.2  Balls  in  Directed  de  Bruijn  Graphs 

We  begin  by  exploring  formulas  for  B f  (x) . 

Lemma  4.80.  Bff  (x iX2  . . .  xn)  =  Ui=o  ^ d,  ©  xix2  ■  ■  ■  xn-i- 

Definition  4.81.  We  will  refer  to  the  set  Ald  ®  X1X2  . . .  x„_.j  from  Lemma  4.80 
as  suffix  class  i  with  respect  to  x.  We  will  denote  this  by  iS,;(x). 

Lemma  4.82.  5,:(x)|  =  dl . 

Definition  4.83.  If  there  exist  i,  j  £  [0,  t]  with  i  <  j  such  that  X1X2  . . .  xra_7  = 
Xj-i+iXj-i+2  ■  ■  ■  xn-i,  then  we  will  say  that  x  has  an  (i,  j)-shift. 

The  following  lemma  makes  clear  why  we  chose  this  notation. 

Lemma  4.84.  If  there  exists  a  string  y  and  i,j  £  [0,  £]  with  i  <  j  such  that 
y  £  iSj(x)  and  y  £  Sj{x),  then  x  has  an  (i,j) -shift. 

Proof.  We  know  that  the  following  two  equalities  must  be  true. 

yi+iyi+2  •  ■  •  dn  —  X1X2  .  ■  .  Xn—i 
yj+ldj+2  ■  ■  ■  dn  =  X1X2  •  ■  .  Xn—j 

Since  i  <  j,  note  that  the  second  equality  compares  shorter  strings.  Thus  we 
deduce  that  the  following  equalities  must  also  hold. 


%n—i 

—  Un  —  %n—j 

•En—i—  1 

—  Vn—l  —  %n—j— 1 

xj—i+ 1 

=  yj+ 1  =  x1 

Thus  we  have  X1X2  . 

3 

1 

II 

^0. 

—i+iXj—i+2  ■  ■  ■  Xn-i ,  as  required. 

□ 

Lemma  4.85.  If  x  =  X1X2  ■  ■  -xn  has  an  (■ i,j)-shift ,  then  Si(x)  C  Sj(x). 

Proof.  Reversing  the  argument  from  Lemma  4.84,  we  have  that  an  (i,j)-shift 
implies  that 

yi-\-iyi-\-2  ■  •  •  dn  —  X\X2  ■  •  ■  Xn-i- 

Since  there  are  no  restrictions  on  y\y2  ■  ■  ■  yi,  we  see  that  any  string  that  satisfies 
these  restrictions  on  yi+\yi+2  . . .  yn  is  in  both  suffix  classes.  This  describes  all 
strings  in  suffix  class  i,  and  hence  we  must  have  iSi(x)  C  Sj(x).  □ 
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Combining  the  last  two  lemmas,  we  arrive  at  the  more  complete  version. 


Lemma  4.86.  String  x  has  an  (i,j) -shift  if  and  only  if  Si(x)  C  Sj(x). 
Lemma  4.87.  If  x  has  no  (i,  j)-shifts,  then 


I  Bt  (®)|  = 


U  ^o) 


2=0 


i=0 


Lemma  4.88.  If  x  admits  an  ( i,j)-shift ,  then  there  are  dl  double- counted 
strings  in  Bf  ( x ) . 

Proof.  By  Lemma  4.87,  if  there  are  no  (i,  j)-shifts,  then  |i?t_(a;)|  =  J2l=o  ■  By 
Lemma  4.86,  there  is  a  bijection  between  (i,j)-shifts  and  nested  suffix  classes 
as  follows. 

(ij)  i-»  («%(£),$,•  (a:)) 

In  other  words,  each  (*,  j)-shift  corresponds  to  every  string  in  S,f  x)  also  being 
counted  in  Sj(x).  Thus,  for  counting  purposes,  we  have  dl  double-counted 
strings.  Q 

Lemma  4.89.  If  x  admits  an  ( i,j)-shift  and  a  (J,  k)-shift,  then  it  admits  an 
(i,  k) -shift. 

Proof.  By  Lemma  4.86,  an  (i,j)-shift  implies  Si(x)  C  Sj(x),  and  a  (j,  fc)-shift 
implies  Sj(x)  C  S/~(x).  Hence  we  must  also  have  Si(x)  C  Sk{x),  which  corre¬ 
sponds  to  an  ( i ,  fc)-shift.  □ 

Theorem  4.90. 


Bt  (®)| 


(  \ 
E  ^ 

ie[o,t-i] 

\3 (i,j)-shift  for  some  j  / 


Proof.  By  Lemma  4.87,  we  start  with  the  total  number  of  strings  (including 
multiplicities)  at  Jf,-0  dn.  By  Lemma  4.88,  we  need  to  subtract  dl  when  there 
exists  an  (z,j)-shift,  but  only  subtract  once  for  each  i.  Thus  we  arrive  at  the 
given  formula.  □ 


4.4.3  Other  Useful  Functions  and  Matlab 

In  this  section  we  present  some  de  Bruijn  functions  that  return  many  useful 
parameters.  Although  the  following  functions  are  useful  enough  to  warrant 
their  inclusion  in  this  report,  they  do  not,  taken  individually,  justify  their  own 
dedicated  section.  Therefore,  we  opted  to  present  them  all  in  this  section.  The 
first  of  these  useful  functions  that  we  present  is  one  called  GenerateNodes.  This 
function  returns  the  string  representation  in  the  desired  base. 
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Algorithm  4  GenerateNodes:  Generates  all  strings  of  length  n  and  base  d 
1:  procedure  GenerateNodes}^,  ?r) 

2:  N  =  dn 

3:  nodes  =  cell  (1,  n) 

4:  for  i  from  1  to  TV  do 

5:  nodes{?'}  =  dec2base(i  —  1  ,d,n) 

6:  end  for 

7:  end  procedure 
8:  return  nodes 


Notice  that  in  line  5  the  MATLAB  string  library  function  dec2base(i  —  1,  d,n ) 
is  called.  This  is  where  the  true  work  gets  done  in  this  function.  This  method 
converts  the  value  i  —  1  from  a  decimal  number  to  one  of  base  n.  As  an  example 
we  executed  GenerateNodes (3,  2)  in  MATLAB,  and  the  output  is  provided 
below. 

ans  =  {00,  01,  02, 10, 11, 12,  20,  21,  22} 

An  alternative  method  for  representing  a  graph,  rather  than  in  the  vertex 
and  arc  graph  that  has  been  used  up  until  now  in  this  report,  is  to  use  an 
N-by-N  matrix,  where  N  is  the  number  of  vertices  in  the  graph.  For  example 
an  adjacency  matrix  for  a  graph  is  used  to  represent  which  vertices  in  a  graph 
are  adjacent  to  which  other  vertices.  The  code  to  produce  an  adjacency  matrix 
for  B(d,  n)  follows. 


Algorithm  5  AdjacencyMat:  Generates  Directed  Adjacency  Matrix 
i:  procedure  AdjacencyMat}^,  n) 

2:  N  =  dn 

3:  for  i  from  1  to  A  do 

4:  x  =  dec2base(*  —  1,  d,  n) 

5:  for  j  from  0  to  d  —  1  do 

6:  y  =  strcat(a;(2  :  n),num2str(j)) 

7:  z  =  base2dec(y,  d) 

8:  A(i,  Z  +  1)  =  1 

9:  end  for 

10:  end  for 

11:  end  procedure 
12:  return  A 


There  are  a  few  more  MATLAB  string  library  functions  used  in  Adjacen¬ 
cyMat.  In  line  4  we  see  the  function  dec2base(decimal,  base,  length)  again.  In 
this  case,  the  function  converts  decimal  strings  to  a  base  system  specified  in  the 
third  parameter.  Function  base2dec()  in  line  7  performs  the  opposite  operation. 
The  function  num2string(j)  converts  a  number  j  into  a  string.  The  function 
strcat(x,  (2  :  n))  is  used  to  concatenate  string  x  removing  string  positions  2  on¬ 
ward  to  the  tail.  Essentially  what  this  function  does  is  revealed  in  line  8  where 
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matrix  A  gets  re-populated  when  adjacent  nodes  are  found.  That  is,  it  visits 
every  row  i  placing  a  1  at  every  column  positioned  z  +  1  (A,  the  number  of 
nodes  minus  d,  the  degree)  distance  apart.  When  this  method  is  executed  using 
the  following  command  (as  an  example)  in  MATLAB  “AdjacencyMat(2,  4)”, 
the  adjacency  matrix  in  Table  4.1  is  returned. 

To  read  the  MATLAB  output  for  Adjacency  Mat  (2,  4)  (in  Table  4.1  above), 
select  a  vertex  string  identifier  (name)  from  the  row  label  (highlighted  blue) 
and  then  scan  across  the  row  associated  with  the  vertex.  If  a  1  appears  in  the 
row  then  the  vertex  listed  in  the  corresponding  column  label  (also  highlighted 
blue)  is  adjacent  to  the  vertex  under  consideration.  By  contrast,  a  0  in  the  row 
indicates  that  the  vertex  in  the  column  label  is  not  adjacent  to  the  vertex  under 
consideration.  From  the  standpoint  of  efficiency,  AdjacencyMat  is  somewhat 
wasteful.  The  structure  of  the  nested  for  loops  alone  cause  the  function  to 
go  cubic.  In  addition  to  this,  the  MATLAB  functions  that  are  called  within 
AdjacencyMat,  such  as  strcat,  num2str,  and  base2dec ,  undoubtedly  come  at  a 
cost  as  well. 

Another  matrix  that  can  be  used  to  represent  a  de  Bruijn  graph  is  a  distance 
matrix.  Unlike  an  adjacency  matrix,  a  distance  matrix  shows  the  distance  from 
every  node  to  every  other  node  in  the  graph.  In  our  de  Bruijn  library  of  functions 
there  is  a  method  to  generate  a  directed  distance  matrix.  The  code  follows. 


Algorithm  6  DirectedDistanceMat:  Generates  Directed  Distance  Matrix 
i:  procedure  DirectedDistanceMat^,  n) 

2:  A  =  dn 

3:  for  i  from  1  to  A  do 

4:  for  j  from  1  to  A  do 

5:  nodes(*,j)  =  DD((dec2base(i  —  1,  d,  n)),(dec2base(j  —  1  ,d,n))) 

6:  end  for 

7:  end  for 

8:  end  procedure 
9:  return  nodes 


As  you  can  see  the  function  calls  our  library  function  DD(A',  Y)  in  line  5, 
which  computes  the  distance  between  two  strings  in  the  directed  de  Bruijn 
graph.  Once  the  node  names  are  generated,  DD(A,  Y)  computes  the  distance 
and  populates  the  nodes  array.  After  exiting  the  outer  for  loop,  nodes  is  re¬ 
turned.  The  cost  of  DirectedDistanceMat  is  at  least  quadratic  in  A,  0(A2), 
due  to  the  nested  for  loops,  and  this  cost  does  not  take  into  account  the  called 
library  functions  dec2base().  Since  DD(d,  n)  is  linear,  it  does  not  have  much 
of  an  effect  on  efficiency.  We  executed  DirectedDistance(2,4)  using  MATLAB 
which  returned  the  following  distance  matrix  in  Table  4.2. 

The  undirected  counterpart  for  DirectedDistanceMat  in  the  DeBruijn  library 
is  UndirectedDistanceMat. 

About  the  only  thing  worth  mentioning  for  this  function  is  that  the  time 
complexity  and  efficiency  are  very  poor.  This  function  calls  UD(d,  n)  which 
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Algorithm  7  UndirectedDistanceMat:  Generates  Undirected  Dist.  Mat. 
i:  procedure  UndirectedDistanceMat^,  n) 

2:  N  =  dn 

3:  for  i  from  1  to  N  do 

4:  for  j  from  1  to  A  do 

5:  nodes(*,j)  =  UD((dec2base(«  —  1,  d,  n)),(dec2base(j  —  1  ,d,n))) 

6:  end  for 

7:  end  for 

8:  end  procedure 
9:  return  nodes 


runs  at  0(n3)  efficiency.  Exacerbating  an  already  inefficient  function,  Undi- 
rectedDistanceMat()  itself  runs  in  0(N2)  efficiency.  This  brings  our  efficiency 
down  to  (0(n3)  x  0(N2))  =  0{N3d2n).  The  distance  matrix  representing  the 
output  for  UndirectedDistanceMat (2,  4)  is  in  Table  4.3. 

As  mentioned  above,  the  reason  AdjacencyMat  (and  the  distance  matrices) 
is  so  inefficient  is  because  of  its  nested  loop  structure.  Looking  at  the  returned 
matrix  in  Table  4.1,  one  of  the  first  things  we  notice  is  that  the  array  is  nearly 
filled  with  zeros.  This  is  because  de  Bruijn  graphs  are  sparsely  populated  with 
relatively  few  nodes  when  compared  to  their  edges.  The  adjacency  matrix  goes 
through  a  lot  of  work  to  produce  a  relatively  small  amount  of  useful  information. 
This  next  function,  VectorNeighbor Generator  returns  the  same  useful  informa¬ 
tion  without  the  superfluous  generation  and  storage  of  useless  data.  Note  that 
this  function  omits  the  self- loops  on  nodes  of  the  form  an. 

Running  this  method  with  the  same  parameters  as  Adjacency Mat(2, 4)  in 
the  MATLAB  command  shell,  we  receive  the  following  smaller  data  structure 
in  return. 

VectorNeighborGenerator(2, 4)  : 

ans  =  {0000  :  0001, 0001  :  0010,  0001  :  0011,  0010  :  0100, 

0010  :  0101,0011  :  0110,0011  :  0111,0100  :  1000,0100  :  1001, 

0101  :  1010, 0101  :  1011, 0110  :  1010, 1100  :  1101, 0111  :  1110, 

0111  :  1111, 1000  :  0000, 1000  :  0001, 1001  :  0010, 1001  :  0011, 

1010  :  0100, 1010  :  0101, 1011  :  0110, 1011  :  0111, 1100  :  1000, 

1100  :  1001, 1101  :  1010, 1101  :  1011, 1110  :  1100, 1110  :  1101, 

1111  :  1110} 

This  function  is  an  improvement  over  AdjacencyMat  because  it  runs  in  linear 
time  complexity  with  respect  to  N  (omitting  any  costs  attributed  from  the  called 
function  strcmp()).  Another  improvement  is  that  it  does  not  return  any  useless 
values,  and  therefore  instead  of  returning  d2n  values  it  only  returns  dn+1  —  d 
values.  In  the  case  of  £>(2,4)  above  it  returned  30  values  instead  of  256.  This 
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Algorithm  8  VectorNeighborGenerator:  Linked  list  rep.  of  graph 
i:  procedure  VectorNeighborGenerator^,  n) 

2:  N  =  dn 

3:  size  =  Nd  —  d 

4:  nodes  =  GenerateNodes  (d,  n) 

5:  edges  =  cell(l,size) 

6:  nNeighbor  =  0 

7:  count  =  1 

8:  currentNode  =  1 

9:  for  i  from  1  to  size+d  do 

10:  nNeighbor  =  nNeighbor  +  1 

11:  if  nNeighbor  >  N  then 

12:  nNeighbor  =  1 

13:  end  if 

14:  if  ~strcmp(nodes{currentNode},  nodes{nNeighbor})  then 

15:  edgescount  -  1  = 

16:  strcat(nodes{currentNode},  char(‘:’),  nodes{nNeighbor}) 

17:  else 

18:  if  count  /  1  then 

19:  count  =  count  -  1 

20:  end  if 

21:  end  if 

22:  if  (mod(«,  d)  =  0)  and  (nNeighbor  ^  0)  then 

23:  currentNode  =  currentNode  +  1 

24:  end  if 

25:  count  =  count  +  1 

26:  end  for 

27:  end  procedure 
28:  return  edges 
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is  an  88.3%  improvement  in  efficiency.  Tracing  the  code,  note  that  the  variable 
size  is  instantiated  to  be  Nd  —  d.  This  is  the  length  of  the  return  structure,  a 
linear  cell  array  called  edges. 

Also  note  that  in  line  4  Vector  Neighbor  Generator  makes  use  of  our  library 
function  GenerateNodes.  It  does  this  so  that  it  may  perform  a  string  compar¬ 
ison  rather  than  a  nested  loop  search  approach.  The  function  uses  a  couple  of 
variables  currentNode  to  represent  the  current  node  under  consideration  and 
nNeighbor.  The  variable  nNeighbor  gets  assigned  by  following  a  programmed 
route  of  nodes  that  are  within  the  currentNode’ s  “reach”  when  calculating  its 
d  -length  hops.  These  hops  follow  a  predictable  path  as  illustrated  below  in 
Figure  11. 


There  are  / E  1=27  edges. 

There  are  /N/=9setsof  d=3  hop  values,  arranged 
into  d—3  alternations. 

\0,1,2  2,3,4  ~4,5$\  1 6,5,4  4,3,2  ~2JjO\  \p,l,2  2,3,4  1A6 ]  <=.  hops 


\00\  Ol  \  02  \ic 
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Figure  11:  Hops  in  2) 

Using  currentNode  and  nNeighbor  in  tandem,  the  linear  array  nodes  is  tra¬ 
versed,  currentNode  simply  iterating  in  sequence  while  nNeighbor  follows  its 
hopping  route.  In  line  14  a  negated  comparison  is  made  to  ensure  that  they  are 
not  equal,  indicating  a  self-pointing  node,  and  if  this  comparison  fails  then  the 
two  nodes  are  assigned  together  as  one  slot  in  the  edges  cell  array  separated  by 
a  colon  (assignment  done  in  line  15).  After  the  loop  counting  variable  reaches 
i  —  size  +d  (in  MATLAB  array  indices  always  start  at  1)  we  exit  the  loop  and 
return  edges ,  the  populated  cell  array. 

Generating  Balls  and  Spheres 

The  last  set  of  functions  involve  the  generation  of  two  very  similar  yet  distinctly 
different  lists  of  nodes.  The  first  is  a  function  to  generate  a  list  of  all  nodes 
within  a  prescribed  distance  from  a  given  node.  The  de  Bruijn  library  functions 
for  generating  a  set  of  all  nodes  contained  in  a  f-ball  follow. 

An  example  of  this  function  is  given  below. 

DirectedBallFromX(l ,  01,  3,  2)  ans  =  {01,10,11,12} 

This  function  calls  DirectedDistanceMat  after  which  it  parses  the  matrix 
looking  for  nodes  within  the  specified  distance,  t  =  1  from  a  specified  node  X  = 
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Algorithm  9  DirectedBallFromX:  Generates  f-out  ball  from  X 
l:  procedure  DirectedBallFromX^,  X,  d,  n) 

2:  N  =  dn 

3:  j  =  1 

4:  x  =  base2dec(X,  d)  +  1 

5:  nodes  =  cell(l) 

6:  B  =  DirectedDistanceMat(d,  n) 

7:  for  i  from  1  to  TV  do 

8:  if  B(x,i)  <  t  then 

9:  nodesjj}  =  dec2base(?'  —  1,  d,  n) 

10:  3=3  +  1 

11:  end  if 

12:  end  for 

13:  end  procedure 
14:  return  nodes 


01  (line  8).  When  found,  these  nodes  within  t  are  converted  to  the  desired  base 
and  placed  into  the  cell  array,  nodes  for  return.  The  cost  of  DirectedBallFromX 
is  quadratic  with  respect  to  N  because  DirectedDistanceMat  is  called  within  it. 

The  undirected  counterpart  function  to  DirectedBallFromX  is  called  Undi- 
rectedBallFromX.  This  function  generates  a  list  of  nodes  that  are  within  a  spec¬ 
ified  distance  from  a  specified  node  also,  but  the  list  returned  has  twice  as  many 
nodes  in  it  since  the  graph  is  undirected.  This  function  also  calls  its  appropriate 
distance  matrix,  UndirectedDistanceMat,  to  assess  the  distances  between  nodes. 


Algorithm  10  UndirectedBallFromX:  Generates  t-ball  from  X 
i:  procedure  UndirectedBallFromX^,  A,  d,  n) 

2:  N  =  dn 

3:  j  =  1 

4:  x  =  base2dec(X,  d)  +  1 

5:  nodes  =  cell(l) 

6:  B  =  UndirectedDistanceMat  (d,  n) 

7:  for  i  from  1  to  A  do 

8:  if  B(x,i)  <  t  then 

9:  nodes{j}  =  dec2base(?'  —  1,  d,  n) 

10:  j  =  j  +  1 

11:  end  if 

12:  end  for 

13:  end  procedure 
14:  return  nodes 


As  expected  UndirectedBallFromX  requires  0(N3N2)  time  efficiency  be¬ 
cause  it  calls  UndirectedDistanceMat  (which  runs  in  0(N2)  time).  Although 
UndirectedDistanceMat  works  well,  it  is  not  at  all  efficient. 

Let  us  examine  an  application  of  an  undirected  de  Bruijn  graph  where  the 
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function  UndirectedBallFromX  is  utilized.  Imagine  a  radio  network  modeled  on 
the  undirected  de  Bruijn  graph  £>(2,4)  where  the  repeater  tower,  represented 
by  node  1100  is  experiencing  interference.  In  order  for  our  transmitter,  node 
0000,  to  reach  our  receiver,  node  1111,  we  must  find  a  repeater  tower  path  that 
avoids  the  defective  tower.  Executing  the  function  UndirectedBallFromX (1 100, 
1,  2,  4)  returns  the  list  {0110, 1000, 1001, 1100, 1110}.  By  avoiding  transmission 
through  these  towers,  a  fault  free  route  can  be  determined.  The  shortest  path 
is  highlighted  in  green  and  it  follows  nodes  {0001,  0011,  0111, 1111}  (see  Figure 
12). 


transmitter 


Figure  12:  Radio  network  £1(2,4)  showing  avoidance  of  fault  paths 

The  function  DirectedAtFromX  is  similar  to  DirectedDFromX  except  that 
instead  of  locating  all  nodes  within  a  given  distance,  the  function  locates  all 
nodes  at  the  designated  distance.  The  set  of  all  nodes  at  exactly  distance  t 
from  X  is  also  known  as  a  t-sphere  centered  at  X. 

An  example  of  this  function  is  given  below. 

DirectedAtFromX(l,  01,  3,  2)  ans  =  {10, 11, 12} 

This  function  calls  DirectedDistanceMat  after  which  it  parses  the  matrix 
looking  for  nodes  at  the  specified  distance,  t  =  1  from  a  specified  node  X=01 
(line  8).  When  found,  these  nodes  at  t  are  converted  to  the  desired  base  and 
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Algorithm  11  DirectedAtFromX:  Generates  t-spliere  from  X 
l:  procedure  Directed AtFromX^,  A,  d,  n) 

2:  N  =  dn 

3:  j  =  1 

4:  a:  =  base2dec(X,  d)  +  1 

5:  nodes  =  cell(l) 

6:  B  =  DirectedDistanceMat(d,  n) 

7:  for  i  from  1  to  A  do 

8:  if  B(x,i)  =  t  then 

9:  nodes{j}  =  dec2base(i  —  1,  d,  n) 

10:  3=3  + 1 
11:  end  if 

12:  end  for 

13:  end  procedure 
14:  return  nodes 


placed  into  the  cell  array,  nodes  for  return.  Like  DirectedBallFromX,  the  cost 
of  DirectedAtFromX  is,  quadratic  time  efficiency. 

The  undirected  counterpart  function  to  DirectedAtFromX  is  called  Undi¬ 
rected AtFromX.  This  function  generates  a  list  of  nodes  that  are  at  a  specified 
distance  from  a  specified  node  also,  but  the  list  returned  has  more  nodes  in  it 
since  the  graph  is  undirected.  This  function  also  calls  its  appropriate  distance 
matrix,  UndirectedDistanceMat ,  to  assess  the  distances  between  nodes. 

Returning  to  our  radio  tower  scenario,  where  the  tower  configuration  is  mod¬ 
eled  after  the  de  Bruijn  network  B( 2, 4).  This  time,  imagine  that  the  transmitter 
system  underwent  an  upgrade  and  so  now  it  is  capable  of  propagating  signals 
at  much  higher  power.  Our  function  Undirected  AtFromX  could  be  utilized  to 
determine  broadcast  range  from  the  transmitter  tower,  node  0000.  Let  us  say 
we  broadcast  a  signal  at  four  discrete  and  incrementally  higher  power  levels,  1 
-  4.  After  each  transmission,  we  pause  and  await  confirmation  of  receipt  from 
the  other  towers  in  our  network.  After  our  first  broadcast,  at  power  level  1,  we 
receive  notification  from  towers  0001,  and  1000.  After  our  broadcast  at  power 
level  2  we  receive  confirmation  from  0010,  0011,  0100, 1100  as  well  as  those  who 
previously  acknowledged.  On  power  level  number  3  we  receive  notification  from 
towers  0101,  0110, 0111, 1001, 1010,  and  ,  1110  as  well  as  all  those  who  have  pre¬ 
viously  acknowledged.  Finally  we  broadcast  at  power  level  4,  and  we  receive 
acknowledgement  from  towers  1011, 1101,  and  1111  in  addition  to  all  other  tow¬ 
ers  in  the  network  (Refer  to  Figure  13). 
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transmitter  (0000) 


Figure  13:  Radio  Network  15(2,4)  showing  Broadcast  Distance 


4.5  Recursive  Constructions 

The  standard,  well-known  recursive  construction  for  the  de  Bruijn  graph  inducts 
on  the  string  length,  n,  while  holding  the  alphabet  size  d  fixed.  This  construction 
illustrates  the  fact  that  B(d,n )  is  the  line  graph  of  B(d,n  —  1).  For  a  good 
description,  see  [2]. 

Instead,  we  will  focus  on  a  new  construction  that  increases  the  alphabet  size 
while  holding  the  string  length  fixed.  Our  construction  is  as  follows. 

Construction  4.91.  To  construct  B(d,  n )  from  B(d —  1,  n),  we  do  the  following. 

1.  Make  d  +  1  copies  of  B(d  —  1,  n)  labeled  “Copy  i”  for  i  £  {0, 1,  2, ... ,  d}. 

2.  For  each  i  £  {0, 1,2 ,d—  1},  in  Copy  i  we  replace  every  occurrence  of 
letter  i  with  letter  d.  Leave  Copy  d  unaltered,  i.e.  Copy  d  =  B(d  —  1  ,n). 

3.  Combine  all  of  the  new  copies  to  obtain  B(d,n )  as  follows. 

(a)  New  vertices:  Any  vertex  containing  the  letter  d,  i.e.  all  strings  of 
length  n  with  at  least  one  d. 

(b)  New  edges:  New  edges  come  from  the  copies  as  follows: 

Copy  0:  All  edges  containing  d 

Copy  1:  All  edges  containing  0,  d 
Copy  2:  All  edges  containing  0, 1,  d 
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Copy  3:  All  edges  containing  0, 1,  2,  d 


Copy  k:  All  edges  containing  0, 1, 2, . . . ,  k  —  1,  d 

Copy  d  —  1:  All  edges  containing  0, 1,  2, . . . ,  d  —  1,  d 

We  now  illustrate  this  construction  to  obtain  B( 4,  2)  from  B{ 3,  2).  New  edges 
are  colored.  For  each  copy,  we  count  how  many  new  edges  are  added  to  ensure 
that  we  have  a  total  of  43  =  64  edges  represented  for  B( 4,  2).  We  will  use  the 
principle  of  inclusion-exclusion  to  count  the  number  of  edges  added. 

Example: 

Copy  3:  Note  that  Copy  3  =  B( 3, 2). 


Copy  3 

00 

01 

02 

10 

11 

12 

20 

21 

22 

00 

2 

1 

1 

1 

0 

0 

1 

0 

0 

01 

1 

0 

0 

2 

1 

1 

1 

0 

0 

02 

1 

0 

0 

1 

0 

0 

2 

1 

1 

10 

1 

2 

1 

0 

1 

0 

0 

1 

0 

11 

0 

1 

0 

1 

2 

1 

0 

1 

0 

12 

0 

1 

0 

0 

1 

0 

1 

2 

1 

20 

1 

1 

2 

0 

0 

1 

0 

0 

1 

21 

0 

0 

1 

1 

1 

2 

0 

0 

1 

22 

0 

0 

1 

0 

0 

1 

1 

1 

2 

New  edges:  33  =  27. 

Copy  0: 


Copy  0 

33 

31 

32 

13 

11 

12 

23 

21 

22 

33 

2 

1 

1 

1 

0 

0 

1 

0 

0 

31 

1 

0 

0 

2 

1 

1 

1 

0 

0 

32 

1 

0 

0 

1 

0 

0 

2 

1 

1 

13 

1 

2 

1 

0 

1 

0 

0 

1 

0 

11 

0 

1 

0 

1 

2 

1 

0 

1 

0 

12 

0 

1 

0 

0 

1 

0 

1 

2 

1 

23 

1 

1 

2 

0 

0 

1 

0 

0 

1 

21 

0 

0 

1 

1 

1 

2 

0 

0 

1 

22 

0 

0 

1 

0 

0 

1 

1 

1 

2 

New  edges: 

Total  edges:  +33  27 

-  edges  w/o  3:  —  23  —8 
19 
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Copy  1: 


Copy  1 

00 

03 

02 

30 

33 

32 

20 

23 

22 

00 

2 

1 

1 

1 

0 

0 

1 

0 

0 

03 

1 

0 

0 

2 

1 

1 

1 

0 

0 

02 

1 

0 

0 

1 

0 

0 

2 

1 

1 

30 

1 

2 

1 

0 

1 

0 

0 

1 

0 

33 

0 

1 

0 

1 

2 

1 

0 

1 

0 

32 

0 

1 

0 

0 

1 

0 

1 

2 

1 

20 

1 

1 

2 

0 

0 

1 

0 

0 

1 

23 

0 

0 

1 

1 

1 

2 

0 

0 

1 

22 

0 

0 

1 

0 

0 

1 

1 

1 

2 

New  edges: 


Total  edges: 

+33 

27 

-  edges  w/o  0: 

— 23 

-8 

-  edges  w/o  3: 

— 23 

-8 

+  edges  w/o  0,3: 

+13 

+1 

12 


Copy  2: 


Copy  2 

00 

01 

03 

10 

11 

13 

30 

31 

33 

00 

2 

1 

1 

1 

0 

0 

1 

0 

0 

01 

1 

0 

0 

2 

1 

1 

1 

0 

0 

03 

1 

0 

0 

1 

0 

0 

2 

1 

1 

10 

1 

2 

1 

0 

1 

0 

0 

1 

0 

11 

0 

1 

0 

1 

2 

1 

0 

1 

0 

13 

0 

1 

0 

0 

1 

0 

1 

2 

1 

30 

1 

1 

2 

0 

0 

1 

0 

0 

1 

31 

0 

0 

1 

1 

1 

2 

0 

0 

1 

33 

0 

0 

1 

0 

0 

1 

1 

1 

2 

New  edges: 


Total  edges 

+33 

27 

-  edges  w/o  0 

— 23 

-8 

-  edges  w/o  1 

— 23 

-8 

-  edges  w/o  3 

— 23 

-8 

+  edges  w/o  0,1 

+13 

+1 

+  edges  w/o  0,3 

+13 

+1 

+  edges  w/o  1,3 

+13 

+1 

-  edges  w/o  0,1,3 

-03 

-0 

6 


This  gives  us  a  total  of  number  edges:  27  +  19  +  12  +  6  =  64,  as  desired. 
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Algorithm 

We  now  wish  to  convert  this  process  to  an  algorithm  for  a  program  such  as 
Matlab  to  generate  B(d  +  1,  n)  from  B(d ,  n).  From  our  previous  discussions  and 
examples,  we  know  that  we  have  the  following  new  edges  for  each  copy.  We 
will  rename  Copy  d  to  Copy  -1,  and  our  motivation  for  this  will  be  clear  in  the 
subsequent  counting  formulas. 

Copy  -1:  All  (n  +  l)-strings  on  alphabet  {0, 1, 2, . . . ,  d  —  1}. 

— ►  Add  dn+1  edges. 


Copy  0:  All  (n+  l)-strings  on  alphabet  {0, 1, 2, . . . ,  d  —  1,  d}  \  {0}  that  contain 
the  letter  d. 

->•  Add  dn+1  -  (\)(d-  1)"+1  edges. 

Copy  1:  All  (n+  l)-strings  on  alphabet  {0, 1, 2, . . . ,  d  —  1,  d}  \  {1}  that  contain 
the  letters  0,  d. 

->  Add  dn+1  -  (Did-  1)"+1  +  (Did-  2)”+1  edges. 

Copy  2:  All  (n+  l)-strings  on  alphabet  {0, 1, 2, . . . ,  d—  1,  d}  \  {2}  that  contain 
the  letters  0, 1,  d. 

-►  Add  dn+1  -  (l)  id  -  1)"+1  +  (D  id  -  2)"+1  -  (3)  id  -  3)”+1  edges. 


Copy  k:  All  (n+  l)-strings  on  alphabet  {0, 1,  2, 
the  letters  0, 1, 2, . . . ,  k  —  1,  d. 

->  Add  ed§es- 


,d- 


1,  d}  \  {fc}  that  contain 


Copy  d  —  1:  All  (n  +  l)-strings  on  alphabet  {0, 1, 2  . . . ,  d  —  1,  d}  \  {d  —  1}  that 
contain  the  letters  0, 1, 2, . . . ,  d  —  2,  d. 

->  Add  EiLo(“1)I(*)(d_  *)"+1  edges. 

This  gives  us  that  the  total  number  of  edges  in  Bid  +  1,  n )  is: 


\\Bid  +  l,n)\\ 


E  nc°py 


k=- 1 
d- 1  k+ 1 


k+  1 

i 


id-i) 


E  E(-d 

k=—  1  i—0 

ED-uIbw-i)"*1 


n+1 


j= 0 2—0 
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Note  that  this  algorithm  will  never  count  strings  that  contain  at  least  one 
of  every  letter  in  the  set  {0, 1,2,...,  d},  and  hence  we  must  require  that  n  <  d. 
So  we  have  now  developed  the  following  combinatorial  identity. 

Theorem  4.92.  For  n  <  d, 


To  address  the  cases  where  n  >  d,  we  must  count  the  number  of  (n  +  lu¬ 
strings  on  {0, 1,2,...,  rf}  in  which  each  letter  appears  at  least  once.  This  is 
equivalent  to  the  number  of  onto  functions  from  an  (n  +  l)-set  to  a  (d  +  l)-set. 
It  is  a  well-known  result  and  a  standard  example  of  the  Principle  of  Inclusion- 
Exclusion  (see  [19]  for  a  discussion  of  this  principle)  that  this  number  of  onto 
functions  is  given  by: 


Thus  we  obtain  the  following  result  that  covers  all  cases. 
Theorem  4.93.  For  all  n,  d  £  Z+: 


Example 

We  now  run  through  a  complete  example  to  construct  0( 3,  2)  from  0(2,  2). 
Copy  -1:  We  begin  with  the  graph  0(2,  2).  See  Figure  14. 


Figure  14:  Copy  -1 
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Copy  0:  We  add  in  a  copy  of  B{ 2,  2)  where  the  letter  0  is  renamed  2.  See 
Figure  15  for  Copy  0  and  Figure  16  for  the  merged  figure,  with  new  edges 
in  red. 


Figure  15:  Copy  0 


Figure  16:  Merged  Copy  -1  and  Copy  0 


Copy  1:  We  add  in  a  copy  of  B{ 2,2)  where  the  letter  1  is  renamed  2.  See 
Figure  17  with  new  edges  in  blue. 


Figure  17:  Merged  Copy  -1,  Copy  0,  and  Copy  1 
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Additional:  Since  n  >  d,  we  must  additionally  consider  all  3-strings  that  con¬ 
tain  every  letter  from  {0, 1,  2}  at  least  once.  Note  that  this  is  the  set  of 
all  permutations  of  {0, 1,  2},  so  we  add  in  the  following  edges. 

{012,021,102,120,201,210} 

See  Figure  18  with  new  edges  in  green. 


Figure  18:  Merged  Copy  -1,  Copy  0,  Copy  1,  and  additional  edges 


4.6  Identifying  Code  Problem  Formulations 

As  an  NP-complete  problem,  computing  base  cases  for  the  minimum  identifying 
code  problem  is  quite  challenging.  In  this  chapter,  we  explore  various  methods 
utilized  to  compute  base  cases  around  which  to  develop  conjectures. 

4.6.1  Parallel  Computing 

The  pseudocode  in  Algorithm  12  describes  our  brute  force  algorithm,  imple¬ 
mented  in  Matlab  using  the  Parallel  Computing  Toolbox. 


Algorithm  12  Brute  Force  Algorithm 
1:  procedure  BRUTEFORCE(d,  n,  k)  t>  B(d,n)  and  subset  size  k 

2:  Create  list  of  all  subsets  of  k  nodes 

3:  for  *  =  1  :  (dfc )  do 

4:  if  subset  i  is  a  valid  identifying  code  then 

5:  Display  subset  i  to  user 

6:  else 

7:  Do  nothing 

8:  end  if 

9:  end  for 

10:  end  procedure 


65 


Approved  for  Public  Release;  Distribution  Unlimited. 


Parallelizing  our  algorithm  takes  two  steps.  The  first  step  is  to  replace 
“For”  on  line  3  with  “Parfor”.  This  indicates  to  Matlab  to  use  the  parallel 
computing  toolbox  and  run  each  loop  iteration  independently.  The  second  step 
requires  moving  the  construction  of  subsets  inside  the  Parfor  loop.  Because  of 
the  exponential  increase  in  the  number  of  subsets  created,  it  is  more  efficient 
to  generate  each  subset  within  the  loop  and  discard  it  after  the  iteration  than 
to  store  all  (dfc  )  /c-subsets  and  traverse  through  the  list.  This  is  done  using  a 
fc-subset  unranking  algorithm.  Two  of  these  algorithms  (from  [16])  are  listed 
as  Algorithms  13  and  14.  These  unranking  functions  allow  us  to  completely 
parallelize  the  brute  force  algorithm,  and  the  results  obtained  are  listed  in  Figure 
19. 


>  subset  index,  subset  size,  set  size 


Algorithm  13  Revolving  Door  Unranking  Algorithm 
1 
2 

3 

4 

5 

6 

7 

8 
9 

10 
11 


procedure  REvDoOR(r,  k, 

x  =  n 

for  i  =  k  :  1  do 

while  >  r  do 

x  =  x  —  1 

end  while 

U  =  x  +  1 

r=n1)~r~  1 

end  for 

end  procedure 
return  T  =  •  •  ■  ,4) 


Algorithm  14  Lexicographic  Unranking  Algorithm 
l:  procedure  LexUnrank(?-,  k,  n)  >  subset  index,  subset  size,  set  size 

2:  x  =  1 

3:  for  i  =  1  :  k  do 

4:  while  r  >  do 

_  fn—x\ 

5:  r  =  r-{k_i) 

6:  X  =  X  +  1 

7:  end  while 

8:  ti  =  X 

9:  X  =  X  +  1 

10:  end  for 

11:  end  procedure 

12:  return  T  =  (h,  t2,  ■  ■  ■ ,  4) 


4.6.2  D-Wave  Quantum  Annealing  Machine 

Under  the  collaborative  effort  “Adiabatic  Quantum  Computing  Applications 
Research”  (14-RI-CRADA-02)  between  the  Information  Directorate  and  Lock- 
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d\n 

2  3  4  5 

2 

x  4  6  12 

3 

4  9 

4 

5 

5 

6 

Figure  19:  Results  for  B(d,n)  obtained  using  HPC 


heed  Martin  Corporation,  we  aim  to  extend  the  results  obtained  by  the  parallel 
computing  method.  In  general,  the  D-Wave  machine  can  address  a  class  of  Ising 
problems  natively  by  the  hardware.  As  stated  in  the  D-Wave  user  documents, 
“The  D-Wave  hardware  can  be  viewed  as  a  hardware  heuristic  which  minimizes 
Ising  objective  functions  using  a  physically  realized  version  of  quantum  anneal¬ 
ing.”  [9]  The  Ising  model  is  an  energy  minimization  problem  of  -1/+1- valued 
variables.  It  can  be  converted  to  a  quadratic  unconstrained  binary  optimization 
(QUBO)  problem  that  uses  0/1-valued  variables,  and  so  they  are  often  used 
interchangeably. 

Binary  Optimization  Model 

We  present  a  binary  optimization  formula  for  the  1-identifying  code  problem. 
Adjustments  must  be  made  to  create  a  quadratic  version.  We  will  define  this 
model  using  three  separate  functions:  one  to  show  that  the  set  has  the  correct 
size,  one  to  show  that  the  set  is  dominating,  and  one  to  show  that  the  set  is 
separating  (or  identifying). 

Variable  Definitions 

We  will  use  the  notation  B(v)  for  v  €  V(G),  where  B(v)  =  N(v)U{v}.  In  other 
words,  B(v)  is  the  set  containing  all  vertices  adjacent  to  v,  plus  v  itself. 

We  define  the  variables  as  follows. 

T  .  f  1,  if  *  e  B{v)\ 

m  \  0,  otherwise. 


Set  S  has  size  k 

We  define  the  first  function,  Ha,  as  follows. 

Ha  —  (k  %vv) 

=  0  iff  |S|  =  k. 

Set  S'  is  a  dominating  set 

By  definition,  this  is  equal  to  \/v  £  G,  B(v)  fl  S  /  I  This  is  equivalent  to 
the  following. 
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From  this  statement,  we  get  the  following  equation  for  our  second  function. 

Hb  =  ~  xvv)  '  (nMUg£;(l  —  xuv)) 

Set  S  is  a  separating  set 

By  definition,  this  is  equal  to  \/x,y  G  G,  (B(x)  D  S)A(B(y )  flS)  ^  0.  This 
is  equivalent  to  the  following  for  a  specific  pair  x  ^  y. 


(B(x)  n  S)A(B(y)  n  S)  ±  0 


*+  3r>  G  (B(x)  (~l  S)A(B(y)  D  S) 

G)-  3v,  ( v  G  B{x)  D  S)  ®  (v  G  B(y)  D  S) 
3r?,  (yXxv  —  1)  ©  ( XyV  —  1) 

tG  (1  ( Xxv  T  Xyy )  —  0) 

tG  |  |  (1  Xxv  Xyv )  0 

V 


From  this  statement,  we  get  the  following  equation  for  our  third  function, 
summed  over  all  pairs  x,  y. 


He  —  Sx  rii/l  Xxv  xyv)2 

The  Binary  Optimization  Model 


From  these  three  functions,  our  binary  optimization  model  is  the  following. 

H(S)  =  HA{S)  +  Hb(S)  +  HC(S) 

=  0  iff  S  is  an  identifying  code. 

Note  that  while  this  does  provide  a  binary  optimization  model  for  our  prob¬ 
lem,  it  is  not  quadratic.  In  order  to  convert  H(S)  to  a  quadratic  binary  equation, 
each  higher  order  term  must  be  replaced  with  several  new  variables.  While  this 
is  possible,  it  is  a  time-consuming  and  arduous  process  that  introduces  many 
new  variables.  Hence  this  approach  will  likely  not  be  the  most  efficient  imple¬ 
mentation. 
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Integer  Program  Formulation 

From  [20],  we  have  the  following  integer  program  formulation  of  the  minimum 
identifying  code  problem  in  directed  graphs. 

First,  we  define  the  modified  adjacency  matrix  as  follows.  It  is  the 
adjacency  matrix  plus  the  identity  matrix. 


^  f  1,  if  (i,j)  €  E  or  i  =  j; 

\  0,  otherwise. 

Using  this  definition,  we  see  that  a  ball  of  radius  1  surrounding  vertex  i  is 
given  by  the  following  vector. 

B(i)  =  [An,  A2t, . . . ,  Ani]T 


Our  vertex  subset  S  is  defined  as  the  following  vector. 


S  =  [si,  s2,  ■  ■  ■ ,  sn)T  where  s*  = 


1,  if  i  e  S; 

0,  otherwise. 


To  compare  two  identifying  sets  with  respect  to  S  for  vertices  i  and  j,  the 
following  expression  computes  the  size  of  (B(i)  fl  S)A(B(j)  n  S). 


^  |^4fci  —  Akj\  ■  Sfc 

k= 1 


This  implies  that  in  order  for  S  to  be  a  valid  1-identifying  code,  we  must 
have  the  following  inequality  satisfied  for  all  pairs  of  vertices  i  and  j. 

n 

^  ^  \Aki  Akj  |  •  sk  A  1 
k= 1 


For  the  dominating  property  to  be  satisfied,  we  require  the  following  addi¬ 
tional  inequality. 

AS>1t 

Thus  our  integer  program  is  given  by  the  following, 
min  |Sj 

s.t.  YJh= 1 1  Aki  ~  Akj  |  •  sk  >  1,  Vi  ^  j 

A-S  >  1T 

sk  £  {0, 1} 


In  order  to  use  these  ideas  for  the  D-Wave  machine,  our  constraints  must  be 
equalities.  This  means  we  must  add  binary  slack  variables  for  each  inequality. 
For  the  first  set  of  inequalities,  we  must  determine  an  upper  bound  for  each 
inequality.  Since  these  correspond  to  the  constraint  \(B(i)r\S)A(B(j)nS)\  >  1, 
an  easy  upper  bound  is  given  by  the  following. 


\B(i)\  +  \B(j)\  >  1(5(0  n  S)A(B(j)  n  5)1  >  1 
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For  the  class  of  de  Bruijn  graphs,  we  are  able  to  use  this  to  get  a  bound  on  the 
number  of  slack  variables  needed.  Since  the  maximum  size  of  any  ball  in  B(d,  n) 
is  2d  +  1,  this  gives  us  an  upper  bound  of  size  Ad +  2  for  this  class  of  graphs. 
Hence  for  each  inequality  in  this  set,  we  must  add  Ad  +  2  binary  slack  variables 
to  convert  the  inequality  to  an  equality.  Since  there  are  d  ^ d.2 — —  possible  pairs 
i,j,  this  implies  that  we  must  add  a  huge  number  of  binary  slack  variables,  equal 
to  the  following  expression,  just  to  satisfy  the  first  set  of  inequalities. 

dn(dn  —  1 )  (2 d  +1)  slack  variables 

Hence,  this  method  is  not  going  to  be  an  efficient  way  to  map  our  problem  onto 
the  D-Wave  machine. 


Satisfiability  Formulation 

This  approach  formulates  the  identifying  code  problem  as  a  boolean  satisfiability 
problem.  Each  term  in  the  satisfiability  problem  is  mapped  to  an  Ising  model. 
The  mapping  introduces  auxiliary  variables  so  that  the  Ising  model  contains 
only  quadratic  and  linear  terms.  A  graph  embedding  technique  is  then  used  to 
map  this  Ising  model  onto  the  connectivity  of  the  D-Wave  chip.  Finally,  gauge 
transformations  are  used  to  mitigate  the  effects  of  intrinsic  control  errors.  We 
will  illustrate  each  step  with  an  example  of  B(d,n)  when  d  =  2  and  n  =  3.  As 
stated  previously,  these  methods  easily  apply  to  any  graph. 

Satisfiability  Formulation 

First,  we  label  the  nodes  of  B{ 2,3)  from  0  to  dn  —  1  =  7.  Then  we  define 
the  set  of  variables  {xt}  for  i  =  0, 1, . . . ,  7  as  follows. 

_  J  1,  if  node  i  is  included  in  the  identifying  code; 

1  [0,  otherwise. 

Next,  we  look  at  the  ball  for  each  node  and  form  clauses  corresponding  to 
their  domination  constraints. 


Ball 

£(  0)  =  £(000) 
£(1)  =  £(001) 
£(  2)  =  £(010) 
£(  3)  =  £(011) 
£(4)  =  £(100) 
£(  5)  =  £(101) 
£(  6)  =  £(110) 
£(7)  =  £(111) 


Contents 

(000,001, 100}  =  {0, 1,4} 

(000,  001,  010,  011, 100}  =  {0,1,  2, 3, 4} 
(001,  010, 100, 101}  =  (1,  2,  4,  5} 
(001,011, 101, 110,  111}  =  (1,3,  5, 6,  7} 
(000,  001,  010, 100, 110}  =  {0,1,  2, 4,  6} 
(010,  Oil,  101, 110}  =  (2,  3,  5,  6} 

(Oil,  100, 101, 110, 111}  =  (3, 4, 5, 6, 7} 
(Oil,  110, 111}  =  {3,6,7} 


Constraints 
xo  V  x\  V  X4 
10  V  tl  V  12  V  13  V  V4 
XI  V  X2  V  X4  V  X$ 

Xl  V  13  V  15  V  16  v  Xf 
xo  V  XI  V  X2  V  X4  V  XQ 
12  V  13  V  15  V  Xq 
X3  V  X4  V  X5  V  XQ  V  XT 
X3  V  Xo  V  X7 


From  this  set  of  constraints,  we  form  clauses  for  each  pairwise  XOR  of  balls. 
This  is  shown  in  Figure  20 

Now  we  can  eliminate  more  specific  clauses  that  are  implied  by  more  general 
clauses.  For  example,  Figure  21  shows  which  two-ternr  constraints  imply  the 
corresponding  larger  constraints.  Hence,  the  only  constraints  that  we  have  left 
are  given  below. 
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X2  V  X3 
Xo  V  X2  V  X5 
x2  V  x6 
Xo  V  x3  V  X5 
£3  V  a:6 
Xo  V  x3  V  Xo 
x\  V  a’2  V  xr 
x\  V  £4 
£1  V  X5 
X2  V  X4V  X7 
X2  V  X5  V  X7 
£4  V  £5 

Satisfying  this  set  of  constraints  are  the  four  possible  minimum  solutions, 
given  below. 

{x1,x2,x3,x5} 

{x1,x2,x5  ,x6j 
{x2,X3,X4,X5} 

{x2,X4,Xo,X6} 

Mapping  Satisfiability  Clauses  to  Ising  Models 

We  construct  a  Hamiltonian 


H  =  y ^/Hj({xj,i  e  Aj})  +  A  ^  Xj. 

3  i 

Each  of  the  terms  has  the  property  that 

x*  =  argmin'Hj({a;j,i  G  Aj})  iff  \/  Xi  is  true 

\ieAj 

We  will  show  momentarily  how  the  Hj  are  constructed.  The  last  term  A  >  0 
is  a  penalty  term  that  rewards  shorter  length  codes.  Therefore,  the  minimum 
solutions  (or  ground  states)  of  T~L  are  the  minimum  1-identifying  codes. 

In  order  to  solve  the  Hamiltonian  using  adiabatic  quantum  optimization, 
we  have  the  further  constraint  that  the  Hj  must  contain  only  quadratic  and 
linear  terms  in  the  binary  variables  {a q}.  In  general  to  accomplish  this,  we 
must  introduce  auxiliary  variables,  which  we  will  denote  by  {zj}.  Also,  we  will 
switch  to  the  Ising  model  convention  where  each  of  the  Xi  and  Zi  can  take  values 
{+1,  —1}  instead  of  {0, 1}. 

The  mapping  from  OR-clauses  to  Ising  models  that  we  will  use,  namely 

\j  xt^r  Hj{{xi,i  G  A}), 

ieAj 
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depends  only  on  the  number  of  variables  k  =  \Aj\  in  the  OR-clause.  These 
mappings  for  k  =  2  through  k  =  6  are  represented  diagrammatically  in  Figure 
22. 

In  the  diagrams,  numbers  attached  to  a  node  represent  the  linear  coefficients 
in  the  Ising  model,  while  numbers  attached  to  an  edge  represent  the  quadratic 
(coupling)  coefficients  in  the  Ising  model.  For  example,  the  diagram  for  k  =  3 
represents  the  following  Ising  model. 


"IDfxi,  X2,  X3, 2l)  =  X1X2  —  2xi2l  —  2X221  —  2X221  +  21X3  +  XI  +  X2  —  321  —  X3 

It  can  be  confirmed  that  every  ground  state  of  7^3(21,  X2,  X3,  Z\)  satisfies 
ii  V  12  V  X3,  and  conversely  every  combination  of  {xi,  X2,  £3}  that  satisfies 
X\  V  X2  V  X3  corresponds  to  a  ground  state  of  7-^3 (aci ,  X2,  X3,  z\). 

Mapping  the  Ising  model  onto  the  D-Wave  processor 

In  general,  the  graph  of  the  Hamiltonian  we  get  from  the  satisfiability-to-Ising 
mapping  will  not  fit  onto  the  D-Wave  hardware  graph.  The  D-Wave  hardware 
graph,  which  is  called  the  “Chimera”  graph,  is  built  up  of  unit  cells,  each  of 
which  is  a  four  by  four  bipartite  graph,  K. 4,4.  Even  the  simple  Ising  model  for 
3-OR  shown  in  Figure  22  cannot  be  mapped  directly  onto  the  D-Wave  hardware 
graph.  This  can  be  seen  from  the  fact  that  our  graph  £>( 2, 3)  contains  a  3-cycle, 
whereas  the  smallest  cycle  possible  on  the  D-Wave  hardware  graph  is  a  4-cycle. 

Embedding 


Our  first  step  to  embedding  is  to  determine  how  to  map  our  OR-clauses 
to  the  physical  qubits.  One  way  to  embed  the  3-OR  graph  onto  the  D-Wave 
is  shown  in  Figure  23.  We  have  mapped  the  logical  qubit  z  1  to  two  physical 
qubits,  which  are  ferromagnetically  coupled  with  a  coupling  strength  -JFm. 

Similarly,  once  we  constructed  the  full  Ising  Hamiltonian  for  the  minimum 
1-independent  code  problem,  we  can  use  embedding  to  map  the  graph  of  the 
Hamiltonian  onto  the  D-Wave  hardware  graph. 

Problem  Decomposition 


If  the  graph  of  the  Hamiltonian  is  too  large  to  embed  onto  our  current 
504-qubit  D-Wave  hardware  graph,  one  trick  we  can  try  is  to  decompose  the 
satisfiability  problem  into  smaller  pieces  that  can  be  embedded.  For  example, 
in  the  £>( 2,  3)  example  from  earlier,  one  of  the  terms  is  X2  V  X3.  In  order  for  this 
to  be  true,  at  least  one  of  X2  and  X3  must  be  true.  We  consider  and  solve  each 
case  separately. 

If  X2  is  true,  then  so  is  any  OR-clause  containing  X2 ,  so  we  can  eliminate 
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Figure  23:  Embedding  3-OR  onto  the  Chimera  Graph 


those  from  the  conjunction.  We  are  left  with  the  following  clauses. 

x3  V  x6 
Xq  V  x$  V  Xq 
X\  V  X4 
x\  V  x$ 

X4  V  X5 

Similarly,  in  the  case  where  x3  is  true  we  can  eliminate  terms  containing  x3, 
leaving  the  following  clauses. 

x2  V  x6 
Xo  V  x3  V  Xq 
x\  V  x2  V  X7 
X\  V  X4 
Xi  V  x3 
x2  V  X4  V  X7 
X2  V  x3  V  X7 
X4  V  X5 

Both  of  these  subproblems  are  simpler  than  the  original  problem  and  hence 
easier  to  embed.  Whichever  subproblem  yields  the  smaller  identifying  codes  will 
be  the  solution  of  our  original  problem,  or  if  both  subproblems  have  minimum 
solutions  of  the  same  length,  then  we  can  take  the  union  of  the  two  solution 
sets. 

Gauge  Transformations 


Embedding  complex  graphs  leads  to  long  chains,  i.e.  multiple  physical  qubits 
corresponding  to  the  same  logical  qubit.  In  the  current  D-Wave  embedding 
solver,  all  of  the  physical  qubits  in  a  chain  will  be  ferromagnetically  coupled 
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1  2  3  4  9  10  11  12 


Figure  24:  Example  Gauge  Transformation 


with  some  coupling  strength  -JFm,  which  is  iteratively  increased  to  a  large 
enough  magnitude  to  ensure  that  all  of  the  physical  qubits  in  the  chain  agree 
(most  of  the  time)  and  can  be  treated  as  a  single  logical  qubit. 

However,  due  to  the  characteristics  of  the  D-Wave  design,  the  control  pre¬ 
cision  for  implementing  ferromagnetic  couplings  is  somewhat  worse  than  for 
antiferromagnetic  couplings.  So,  embeddings  with  many  long  chains  will  have  a 
greater  tendency  for  control  precision  errors  which  may  affect  solution  quality. 
To  combat  these  effects,  we  can  utilize  a  gauge  transformation ,  where  we  rede¬ 
fine  a  subset  of  the  spin  variables  to  be  the  opposite  sign.  Flipping  a  subset  of 
the  spins  in  this  way  induces  a  transformation  on  the  Ising  coefficients:  If  Si  has 
been  flipped  (S[  =  —Si),  then  h[  =  —hi.  If  one  of  Si  and  S2  has  been  flipped, 
then  J'V2  =  —Ji2-  But  if  both  of  Si  and  S2  have  been  flipped,  then  J[2  =  J12. 

Consider  for  example  the  gauge  transformation  shown  in  Figure  24.  In  the 
figure,  the  red  qubits  are  flipped  by  the  gauge  transformation  while  the  blue 
qubits  are  unchanged.  In  the  first  unit  cell,  all  of  the  horizontal  qubits  are  flipped 
while  the  vertical  qubits  are  unchanged.  In  the  next  unit  cell,  all  of  the  vertical 
qubits  are  flipped  while  the  horizontal  qubits  are  unchanged.  So,  suppose  our 
embedding  contains  the  chain  1  (blue  vertical  qubit),  5  (red  horizontal  qubit),  13 
(blue  horizontal  qubit).  Note  that  each  consecutive  pair  in  the  chain  contains 
exactly  one  flipped  qubit,  so  all  of  the  ferromagnetic  couplings  -JFm  in  the 
chain  will  be  replaced  with  antiferromagnetic  couplings  +JFm  by  the  gauge 
transformation.  By  using  gauge  transformations  like  this,  we  may  be  able  to 
reduce  the  control  precision  errors  caused  by  embeddings  with  long  chains. 

Results  for  0(2,4) 

By  combining  all  of  the  above  tricks,  we  obtained  results  for  the  minimum 
identifying  code  problem  on  the  d  =  2,  n  =  4  undirected  de  Bruijn  graph  on  the 
D-Wave  hardware. 
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SAT  formulation 


Figure  25  shows  the  full  satisfiability  formulation  of  the  1-identifying  code 
for  23(2,4).  This  code  was  generated  using  a  Matlab  function  written  by  Steve 
Adachi.  It  contains  50  clauses  over  16  variables. 

Problem  Decomposition 


For  the  full  satisfiability  formulation  in  Figure  25,  the  Ising  model  was  too 
large  to  embed  on  the  current  D-Wave  hardware  graphs.  The  problem  must  be 
decomposed  further.  Note  that  two  of  the  terms  are  x%  Vxn  and  X4  V  X\2-  Thus, 
at  least  one  of  x 3  and  x\\  must  be  true,  and  at  least  one  of  24  and  X12  must  be 
true.  Considering  the  branch  where  23  and  X4  are  true,  the  satisfiability  prob¬ 
lem  reduces  to  the  problem  shown  in  Figure  26.  This  decomposed  formulation 
consists  of  just  24  clauses  over  14  variables. 

Satisfiability  clause  to  Ising  model  mapping 


Using  the  Ising  model  mappings  shown  in  Figure  22,  we  generated  an  Ising 
model  with  49  auxiliary  variables  {zi}  for  a  total  of  63  variables.  We  furthermore 
added  the  penalty  term  A  xi  so  that  the  ground  state  will  be  a  minimum  1- 
identifying  code.  Note  that  this  is  far  better  than  the  1200+  auxiliary  (slack) 
variables  required  in  the  integer  program  formulation  for  this  case. 

Figure  27  shows  the  logical  graph  of  the  Ising  model.  In  the  figure,  nodes  cor¬ 
responding  to  the  original  14  boolean  variables  are  shown  in  green;  the  remaining 
nodes  represent  the  auxiliary  variables  added  during  the  satisfiability-to-Ising 
mapping  process. 

Embedding 


Using  the  D-Wave  embedding  function  sapiFindEmbeddingO ,  we  found  an 
embedding  of  the  Ising  model  onto  the  current  (504-qubit)  hardware  graph  for 
the  Lockheed-Martin  D-Wave  machine  that  uses  253  physical  qubits,  with  a 
maximum  chain  length  of  8.  This  is  shown  in  Figure  28. 

In  Figure  28,  physical  qubits  corresponding  to  the  same  logical  qubit  have 
the  same  color  and  are  labeled  with  the  same  number.  The  unlabeled  red  qubits 
are  known  faulty  qubits  and  are  not  used. 

Gauge  Transformations 


Since  we  have  an  exact  solution  from  the  parallel  computing  method  on  the 
graph  13(2, 4),  we  know  that  the  23  =  24  =  1  branch  of  the  problem  should  have 
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Figure  25:  Satisfiability  Formulation  for  B( 2,4) 
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Figure  26:  Decomposed  Satisfiability  Formulation  for  23,2:4  true. 


Figure  27:  Logical  graph  of  the  Ising  model 
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Figure  28:  Embedding  the  problem  onto  the  D-Wave  hardware 
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a  ground  state  with  the  remaining  variables 


x8  =  x10  =  x13  =x14  =  l 

corresponding  to  the  minimum  code  length  of  6.  However,  using  the  standard 
D-Wave  embedding  solver  (which  does  not  yet  support  gauge  transformations 
-  it  uses  all  ferromagnetic  couplings  for  the  chains),  we  were  not  able  to  find 
this  ground  state.  Even  using  the  maximum  allowed  combinations  of  D-Wave 
function  parameters  anneal ing_time  and  num_reads  (the  number  of  annealing 
runs  per  call),  the  best  that  we  could  obtain  were  solutions  corresponding  to  a 
code  length  of  7. 

On  the  other  hand,  using  a  homegrown  equivalent  of  the  embedding  solver, 
which  also  has  the  capability  to  incorporate  gauge  transformations,  we  were 
able  to  find  the  above  ground  state  corresponding  to  a  code  of  length  6. 

4.6.3  Satisfiability  Modulo  Theory  Solvers 

Satisfiability  Modulo  Theory  (SMT)  is  a  current  area  of  research  that  is  con¬ 
cerned  with  the  satisfiability  of  formulas  with  respect  to  some  background  theory 
[3] .  These  solvers  combine  boolean  satisfiability  solving  with  decision  procedures 
for  specific  theories.  For  example,  consider  the  following  problem. 

a  =  b  +  1,  c  >  a,  c  <  b 

In  the  theory  of  the  integers,  this  problem  is  not  satisfiable,  however  in  the 
theory  of  the  real  numbers  it  is  satisfiable.  In  general,  solving  an  SMT  problem 
consists  of  first  solving  a  satisfiability  problem,  then  doing  theory-specific  rea¬ 
soning,  and  then  possibly  going  back  and  changing  the  satisfiability  problem. 
This  process  is  repeated  if  necessary.  In  addition,  multiple  theories  can  also 
be  used  in  the  same  satisfiability  modulo  theory  problem  instance,  which  may 
require  additional  repeats  of  this  method. 

To  use  these  solvers  on  our  identifying  code  problem  for  the  undirected  de 
Bruijn  graph,  we  must  first  come  up  with  a  formulation  of  the  problem  using 
decision  procedures.  The  graph  B(d,n),  contains  dn  nodes.  For  each  of  these, 
we  create  a  boolean  variable  that  denotes  whether  or  not  the  node  is  part  of 
the  identifying  code.  We  then  also  create  an  array  of  boolean  variables  for  that 
node’s  identifying  set.  An  assertion  is  added  to  make  sure  that  each  element  of 
the  array  is  true  if  and  only  if  the  corresponding  neighbor’s  boolean  variable  is 
true  (i.e.  if  and  only  if  the  neighbor  is  part  of  the  identifying  code).  To  ensure 
unique  codes,  we  add  a  statement  to  require  that  each  node’s  identifying  set  is 
unique  from  every  other  node’s  identifying  set.  Then,  to  get  codes  of  a  fixed 
size,  we  create  an  integer  variable  for  each  node  and  add  the  constraints  that  the 
integer  is  at  least  0  and  no  greater  than  1.  Next  we  add  an  assertion  that  each 
node’s  integer  variable  is  1  if  and  only  if  its  boolean  variable  is  true.  Finally, 
we  add  a  constraint  that  the  sum  of  all  of  the  integer  variables  is  equal  to  the 
desired  identifying  code  size. 


81 


Approved  for  Public  Release;  Distribution  Unlimited. 


d\n 

2  3  4  5  6  7 

2 

x  4  6  12  (24)  (110) 
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4  9 

4 

5  15 

5 

6 

6 

8 

7 

9 

8 

(10) 

Figure  29:  Minimum  1-identifying  codes  on  B(d,  n ) 

Now  that  the  formulation  of  the  problem  has  been  determined,  we  can  use  a 
commercial  solver  to  find  solutions.  For  this  work,  we  used  the  solver  Z3,  made 
by  Microsoft  Research.  We  begin  by  first  picking  a  code  length,  and  asking 
if  there  exists  an  identifying  code  of  that  length.  If  not,  then  the  code  length 
is  increased  by  1  and  the  problem  is  posed  to  Z3  again.  This  continues  until 
an  identifying  code  of  a  specific  size  is  found.  To  find  all  satisfying  models, 
after  a  single  model  was  found  an  assertion  is  inserted  into  the  formulation 
that  requires  that  the  the  previously  found  identifying  code  be  eliminated  as 
an  option.  This  forces  Z3  to  produce  a  different  solution,  or  to  state  that  the 
formulation  is  unsatisfiable  (and  hence  no  more  identifying  codes  of  that  size 
exist).  This  process  is  repeated  in  a  loop  to  obtain  all  identifying  codes. 

Using  this  approach  on  a  single  core,  we  were  able  to  reproduce  our  results 
for  B(d,n )  from  the  parallel  computing  method  in  much  less  time.  See  Figure 
29  for  a  summary  of  these  results.  The  numbers  in  parentheses  denote  that  we 
found  a  code  of  that  size,  but  did  not  eliminate  the  possibility  of  a  smaller  code 
existing. 

Because  of  the  advancements  in  current  satisfiability  and  satisfiability  mod¬ 
ulo  theory  solvers,  they  offer  the  potential  to  scale  much  better  than  a  paral¬ 
lelized  brute  force  approach.  This  is  due  in  part  to  the  fact  that  many  of  today’s 
solvers  are  capable  of  realizing  which  subsets  of  assignments  will  define  an  un¬ 
satisfiable  result,  and  hence  they  will  avoid  models  in  which  those  statements 
are  set.  In  our  problem,  this  might  correspond  to  a  case  in  which  nodes  A  and  B 
have  the  same  identifying  set.  In  this  case,  the  solver  would  not  bother  looking 
at  combinations  of  True/False  assignments  on  the  other  nodes  that  do  not  affect 
the  identifying  sets  of  A  or  B . 

In  addition  to  the  sophistication  of  today’s  solvers,  there  is  also  the  possi¬ 
bility  of  parallelizing  the  search.  While  some  instances  were  run  manually  in 
a  parallel  manner  for  this  experiment,  there  is  some  research  to  be  done  on 
automatically  parallelizing  the  search  in  order  to  further  our  known  minimum 
results. 
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4.7  Minimum  Identifying  Code  Examples 

In  this  appendix,  we  give  examples  of  some  minimum  identifying  codes  on  de 
Bruijn  networks.  These  identifying  codes  allow  a  vertex  to  be  identified  by  the 
empty  set. 

4.7.1  Size  of  Minimum  Identifying  Codes 


B(d,  n) 

2  3  4  5 

2 

x  4  6  12 

3 

4  8 

4 

4 

5 

6 

4.7.2  Number  of  Minimum  Identifying  Codes 


B(d,  n) 

2  3  4  5 

2 

x  4  44  1694 

3 

3  156 

4 

36 

5 

500 

4.7.3  Complete  Sets  of  Min.  Identifying  Codes 

B{ 2,  2):  Since  {01, 10}  are  twin  vertices,  no  identifying  code  is  possible. 
£>( 3,  2):  The  minimum  identifying  codes  have  size  4. 

•  {01,02,10,20} 

•  {01,10,12,21} 

•  {02,12,20,21} 

£>(4,  2):  The  minimum  identifying  codes  have  size  4. 

•  {01,02,10,20} 

•  {01,02,10,30} 

•  {01,02,20,30} 

•  {01,03,10,20} 

•  {01,03,10,30} 

•  {01,03,20,30} 

•  {01,10,12,21} 

•  {01,10,12,31} 

•  {01,10,13,21} 

•  {01,10,13,31} 
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{01, 

12, 

13, 

21} 

{01, 

12, 

13, 

31} 

{02, 

03, 

10, 

20} 

{02, 

03, 

10, 

30} 

{02, 

03, 

20, 

30} 

{02, 

12, 

20, 

21} 

{02, 

12, 

20, 

23} 

{02, 

12, 

21, 

23} 

{02, 

20, 

21, 

32} 

{02, 

20, 

23, 

32} 

{02, 

21, 

23, 

32} 

{03, 

13, 

30, 

31} 

{03, 

13, 

30, 

32} 

{03, 

13, 

31, 

32} 

{03, 

23, 

30, 

31} 

{03, 

23, 

30, 

32} 

{03, 

23, 

31, 

32} 

{10, 

12, 

21, 

31} 

{10, 

13, 

21, 

31} 

{12, 

13, 

21, 

31} 

{12, 

20, 

21, 

32} 

{12, 

20, 

23, 

32} 

{12, 

21, 

23, 

32} 

{13, 

23, 

30, 

31} 

{13, 

23, 

30, 

32} 

{13, 

23, 

31, 

32} 

£>( 2,  3):  The  minimum  identifying  codes  have  size  4. 

•  {001,010,011,101} 

•  {001,010,101,110} 

•  {010,011,100,101} 

•  {010,100,101,110} 


84 


Approved  for  Public  Release;  Distribution  Unlimited. 


Conclusion 


At  the  conclusion  of  this  effort  we  have  obtained  new  results,  bounds,  and  algo¬ 
rithms  for  f-identifying  codes  on  B(d ,  n),  however  there  is  still  room  for  improve¬ 
ment.  In  particular,  the  problem  of  finding  f-identifying  codes  in  the  undirected 
de  Bruijn  graphs  remains  an  open  and  challenging  problem  of  interest. 

For  future  efforts,  we  suggest  the  following  two  avenues.  First,  we  aim  to 
continue  analyzing  de  Bruijn  graphs  and  their  internal  structures  that  have 
proven  useful  in  many  applications.  This  includes  continuation  of  our  current 
exploration  of  identifying  codes  and  expanding  to  consider  other  related  graph 
structures,  such  as  robust  identifying  codes  that  are  resilient  against  node  and 
link  failure.  We  will  continue  our  quest  for  constructions  of  optimal  identifying 
codes  in  the  undirected  de  Bruijn  networks  as  well  as  consider  approximation 
algorithms.  This  will  include  examining  existing  algorithms  for  identifying  codes 
and  modifying  the  methods  to  take  advantage  of  the  de  Bruijn  graph  properties. 
Additional  key  vertex  subsets  will  also  be  considered,  such  as  resolving  sets  and 
locating  dominating  sets. 

Our  second  research  direction  will  be  to  consider  variations  on  de  Bruijn  net¬ 
works  and  perform  similar  analyses.  Traditional  de  Bruijn  networks  are  based 
on  strings  over  a  fixed  alphabet,  and  variations  that  have  yet  to  be  examined  are 
based  on  different  combinatorial  objects  such  as  permutations.  A  different  vari¬ 
ation  on  de  Bruijn  networks,  known  as  alphabet  overlap  graphs,  provide  a  much 
denser,  more  highly  connected  variant  of  the  de  Bruijn  graph.  These  graphs 
are  relative  newcomers  to  the  academic  arena,  so  a  complete  analysis  of  their 
structural  properties  is  needed  to  determine  their  relevance  and  applicability  to 
real-world  networks. 
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